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PREFACE TO THE FIRST EDITION 

THE theory of factorial an aly sis is mathematical in nature, but 
this book has been written so that it can, it is hoped, be read by 
those who have no mathematics beyond the usual secondary 
school knowledge. Readers are, however, urged to repeat 
some at least of the arithmetical calculations for themselves. 
It is probable that the subject-matter of this book may 
seem to teachers and administrators to be far removed from 
contact with the actual work of schools. I would like 
therefore to explain that the incentive to the study of 
factorial analysis comes in my case very largely from the 
practical desire to improve the selection of children for 
higher education. When I was thirteen years of age and 
finishing an elementary school education, I won a " scholar- 
ship " to a secondary school in the neighbouring town, one 
of the early precursors of the present-day " free places " 
in England. I have ever since then been greatly impressed 
by the influence that event has had on my life, and have 
spent a great deal of time in endeavouring to improve the 
methods of selecting pupils at that stage and in lessening 
the part played by chance. It was inevitable that I should 
be led to inquire into the use of intelligence tests for this 
purpose, and inevitable in due course that the possibilities 
of factorial analysis should also come under consideration. 
It seemed to me that before any practical use could be 
made of factorial analysis a very thoroughgoing examina- 
tion of its mathematical foundations was necessary. The 
present book is my attempt at this. ... It may seem remote 
from school problems. But much mathematical study and 
many calculations have to precede every improvement 
in engineering, and it will not be otherwise in the future 
with the social as well as with the physical sciences. 

GODFREY H. THOMSON 
MORAY HOUSE, 

UNIVERSITY OF EDINBURGH, 
November 1938 
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PREFACE TO THE SECOND EDITION 

THE former chapter on Simple Structure has been entirely 
rewritten and expanded into three chapters on Orthogonal 
Simple Structure, Oblique Factors, and Second-order 
Factors, with a corresponding expansion of section 19 of 
the mathematical appendix. A chapter on estimating 
factor loadings by the method of maximum likelihood and 
a corresponding section of the mathematical appendix 
have been supplied by Dr. D. N. Lawley. Many smaller 
changes have been made in the other chapters, which 
have in some cases been supplemented by Addenda at the 
end of the book. I owe Dr. Lawley thanks for much other 
assistance as well as for the above chapter, and I am again 
indebted to Dr. Walter Ledermann for stimulating dis- 
cussions of several points and especially for suggesting, 
by a remark of his, the geometrical interpretation of 
" structure " and " pattern." He and Mr. Emmett have 
again read the proofs, and the latter has made the now 
index. 

GODFREY H. THOMSON 
MORAY HOUSE, 

UNIVERSITY OF EDINBURGH, 
July 1945 

PREFACE TO THE THIRD EDITION 

CONDITIONS in the printing trade have again made it 
desirable to break as few pages as possible in making 
changes, and the page-numbering is unaltered, in the main, 
up to page 292, where three sections have had to be in- 
serted on the identity of oblique factors after univariate 
selection, on multivariate selection and simple structure, 
and on parallel proportional profiles. A small addition 
has been made in the mathematical appendix at the end of 
section 19, and a new section 9a added on relations between 
two sets of variates, corresponding to a new section in the 
text, on pages 100 and 101, on the conflict between battery 
reliability and prediction. Deletions have been made 
near page 150 to make room for a fuller explanation of 



PREFACE xv 

" degrees of freedom " and for the transfer to the text of 
the Addendum in the second edition about Fisher's z. 
Some changes and additions have been made in pages 165- 
170 concerning the number of significant factors in a cor- 
relation matrix, and a number of smaller changes here and 
there throughout. 

GODFREY H. THOMSON 
MORAY HOUSE, 

UNIVERSITY OF EDINBURGH, 
February 1948 



PREFACE TO THE FOURTH EDITION 

Two sections are added to Dr. Lawley's chapter (XXI), 
giving formulae for the standard errors of individual 
residuals, and of factor loadings, when maximum likeli- 
hood methods have been used. A section, 10a, on Leder- 
mann's shortened calculation of regression coefficients for 
estimating a man's factors now appears in the mathe- 
matical appendix (where it had inadvertently been 
omitted previously). I have added a section on estimating 
oblique factors to Chapter XVIII, and in section 19 of 
the mathematical appendix, I give the modifications 
necessary in Ledermarm's shortened calculation when 
oblique factors are in question. Other changes here and 
there arc only slight. 

GODFREY H. THOMSON 
MORAY HOUSE, 

UNIVERSITY OF EDINBURGH, 
January 1949 



All science starts with hypotheses in other words, 
with assumptions that are unproved, while they may be, 
and often are, erroneous ; but which are better than 
nothing to the searcher after order in the maze of pheno- 
mena. 

T. H. HUXLEY 



PART I 
THE ANALYSIS OF TESTS 

To simplify and clarify the exposition, errors due to 
sampling the population of persons are in Parts I and II 
assumed to be non-existent. 



CHAPTER I 

THE THEORY OF TWO FACTORS 

1. Factor tests. The object of this book is to give some 
account of the " factorial analysis *' of ability, as it is 
called. In actual practice at the present day this science 
is endeavouring (with what hope of success is a matter of 
keen controversy) to arrive at an analysis of mind based 
on the mathematical treatment of experimental data 
obtained from tests of intelligence and of other qualities, 
and to improve vocational and scholastic advice and 
prediction by making use of this analysis' in individual 
cases. r It is a development of the " testing " movement 
the movement in which experimenters endeavour to devise 
tests of intelligence and other qualities in the hope of 
sorting mankind, and especially children, into different 
categories for various practical purposes ; educational (as 
in directing children into the school courses for which they 
are best suited) ; administrative (as in deciding that some 
persons are so weak-minded as to need lifelong institutional 
care) ; or vocational, etc. 

There are many psychologists who would deny that from 
the scores in such tests, or indeed from any analysis, we 
can (ever) return to a full picture of the individual ; and 
without entering into any discussion of the fundamental 
controversy which this denial reveals, everyone who has 
had anything to do with tests will readily agree that this 
is certainly so at present in practice. But the tester may 
be allowed to try to make his modest diagram of the 
individual better, more useful, and if possible simpler. 

Now, the broadest fact about the results of " tests " of 
air sorts, when a large number of them is given to a large 
number of people, is that every individual and every test 
is different from every other, and yet that there are certain 
rather vague similarities which run through groups of 
people or groups of tests, not very well marked off from 

3 



4 THE FACTORIAL ANALYSIS OF HUMAN ABILITY 

one another but merging imperceptibly into neighbouring 
groups at their margins. To describe an individual ac- 
curately and completely one would have to administer to 
him all the thousand and one tests which have been or 
may be devised, and record his score in each, an impossible 
plan to carry out, and an unwieldy record to use even if 
obtained. Both practical necessity and the desire for 
theoretical simplification lead one to seek for a few tests 
which will describe the individual with sufficient accuracy, 
and possibly with complete accuracy if the right tests can 
be found. If, as has been said, there is some tendency 
for the tests to fall into groups, perhaps one test from each 
group may suffice. Such a set of tests might then be said 
to measure the " factors " of the mind. 

2. Fictitious factors. Actually the progress of the 
" factorial " movement has been rather different, and the 
factors are not real but as it were fictitious tests which 
represent certain aspects of the whole mind. But con- 
ceivably it might have taken the more concrete form. In 
that case the " factor tests " finally decided upon (by 
whom, the reader will ask, and when " finally " ?) would 
be a set of standards which, like any other standards, would 
have to be kept inviolate, and unchanged except at rare 
intervals and for good reasons. Some tendency towards 
this there has been. The Binet scale of tests is almost an 
international standard, and there is a general agreement 
that it must not be changed except by certain people upon 
whose shoulders Binet's mantle has fallen, and only seldom 
and as little as possible even by them. But the Binet 
scale is a very complex entity, and rather represents many 
groups of tests than any one test. By " factor tests " one 
would more naturally mean tests of a " pure " nature, 
differing widely from one another so as to cover the whole 
personality adequately. And since actual tests always 
are more or less mixed, it is understandable why " factors " 
have come to be fictitious, not real, tests, to be each 
approximated to by various combinations of real tests so 
weighted that their unwanted aspects tend to cancel out, 
and their desired aspects to reinforce one another, the team 
approximating to a measure of the pure " factor." 
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But how, the reader will ask, do we know a " pure " 
factor, how are we to tell when the actual tests approximate 
to it ? To give a preliminary answer to that question we 
must go back to the pioneer work of Professor Charles 
Spearman in the early years of this century (Spearman, 
1904). The main idea which still, rightly or wrongly, 
dominates factorial analysis was enunciated then by him, 
and practically all that has been done since has been either 
inspired or provoked by his writings. His discovery was 
that the " coefficients of correlation " between tests tend 
to fall into " hierarchical order," and he saw that this 
could be explained by his famous " Theory of Two Factors." 
These technical terms we must now explain. 

3. Hierarchical order. A coefficient of correlation is a 
number which indicates the degree of resemblance between 
two sets of marks or scores. If a schoolmaster, for example, 
gives two examination papers to his class, say (1) in arith- 
metic and (2) in grammar, he will have two marks for every 
boy in the class. If the two sets of marks are identical 
the correlation is perfect, and the correlation coefficient, 
denoted by the symbol r 12 , is said to be + 1. If by some 
curious chance the one list of marks is exactly like the 
other one upside down (the best boy at arithmetic being 
worst at grammar, and so on), the correlation is still perfect, 
but negative, and r lz = 1. If there is absolutely no 
resemblance between the two lists, r 12 = 0. If there is a 
strong resemblance, but falling short of identity, r 12 may 
equal -9 ; and so on. There is a method (the Bravais- 
Pearson) of calculating such coefficients, given the list of 
marks.* " Tests " can obviously be correlated just like 

* The " product-moment formula " is 
sum (a?!^ 2 ) 

12 ~~ <v/{ sum ( x i 2 ) X sum ( x z z ) 

where x { and a? 2 are the scores in the two tests, measured from the 
average (so that approximately half the scores are negative), and 
the sums are over the persons to whom the scores apply. The 
quantity 

a 2 = sum (a;, 2 ) 

1 number of persons 

is called the variance of Test 1, and a its standard deviation. If the 
scores in each test are not only measured from their average, but 
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examinations, and a convenient form in which to write 
down the intercorrelations of a number of tests is in a 
square chequer board with the names of the tests (say 
a, b, c . .) written along the two margins, thus : 





a 


b 


c 


d 


e 


/ 


a 




48 


24 


54 


42 


30 


b 


48 


. 


32 


72 


56 


40 


c 


24 


32 


. 


36 


28 


20 


d 


54 


72 


36 




63 


45 


e 


42 


5(5 


28 


63 




35 


f 


30 


40 


20 


45 


35 




Totals 


1-98 


2-48 


1-40 


2-70 


2-24 


1-70 



It was early found that such correlations tend to be 
positive, and it is of some interest to see which of a number 
of tests correlates most with the others. This can be found 
by adding up the columns of the chequer board, when we 
see in the above example that the column referring to 
Test d has the highest total (2-70). The tests can then be 
rearranged and numbered in the order of these totals, thus : 







1 


2 


3 


4 


5 


6 






d 

' 


b 


e 


a 


/ 


c 


1 


d 




72 


63 


54 


45 


36 


2 


b 


72 




56 


48 


40 


32 


3 


e 


63 


56 


. 


42 


35 


28 


4 


a 


54 


48 


42 




30 


24 


5 


/ 


45 


40 


35 


30 




20 


6 


c 


36 


32 


28 


24 


20 






After the tests have been thus arranged, the tendency 
which Professor Spearman was the first to notice, and which 

are then divided through by their standard deviation, they are said 
to be standardized, and we represent them by ^ and 2 2 . About 
two-thirds of them, then, lie between plus and minus one. With 
such scores Pearson's formula becomes 

__ sum of the products z^z 2 
1 number of persons p 

In theoretical work, an even larger unit is used, namely o\/p. 
With these units, the sum of the squares is unity, and the sum of the 
products is the correlation coefficient. The scores are then said to 
be normalized, but note that this does not mean distributed in a 
" normal " or Gaussian manner. 
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he called " hierarchical order," is more easily seen. It is 
the tendency for the coefficients in any two columns to have 
a constant ratio throughout the column. Thus in our 
example, if we fix our attention on Columns a and /, say, 
they run (omitting the coefficients which have no partners) 
thus : 

54 -45 

48 -40 

42 -35 



24 -20 

and every number on the right is five-sixths of its partner 
on the left. 

Our example is a fictitious one, and the tendency to 
hierarchical order in it has been made perfect in order to 
emphasize the point. It must not be supposed that the 
tendency is as clear in actual experimental data. Indeed, 
at the time there were some who denied altogether the 
existence of any such tendency in actual data. Those who 
did so were, however, mistaken, although the tendency is 
not as strong as Professor Spearman would seem originally 
to have thought (Spearman and Hart, 1912). The follow- 
ing is a small portion of an actual table of correlation coeffi- 
cients* from those days (Brown, 1910, 309). (Complete 
tables must, of course, include many more tests ; in recent 
work as many as 57 in one table.) 





1 


2 


3 


4 


5 


6 


1 


t 


78 


45 


27 


59 


30 


2 


78 


. 


48 


28 


51 


24 


3 


45 


48 


, 


52 


40 


38 


4 


27 


28 


52 


. 


41 


38 


5 


59 


51 


40 


41 


. 


13 


6 


30 


24 


38 


38 


13 





* In this, as in other instances where data for small examples are 
taken from experimental papers, neither criticism nor comment is 
in any way intended. Illustrations are restricted to few tests for 
economy of space and clearness of exposition, but in the experiments 
from which the data are taken many more tests are employed, and 
the purpose may be quite different from that of this book. 
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4. G saturations. This tendency to " hierarchical order " 
was explained by Professor Spearman by the hypothesis 
that all the correlations were due to one " factor " only, 
present in every test, but present in largest amount in the 
test at the head of the hierarchy. This factor is his famous 
" ," to which he gave only this algebraic name to avoid 
making any suggestions as to its nature, although in some 
papers and in The Abilities of Man he has permitted himself 
to surmise what that nature might be. Each test had also 
a second factor present in it (but not to be found elsewhere, 
except indeed in very similar varieties of the same test), 
whence the name, " Theory of Two Factors " really one 
general factor, and innumerable second or specific factors. 

It will be proved in the Mathematical Appendix* that 
this arrangement would actually give rise to " hierarchical 
order." Meanwhile this can at least be made plausible. 
For if Test d has that column of correlations (the first 
in our table) with the other tests solely because it is 
saturated with so-and-so much g ; and if Test b has less g 
in it than d has, it seems likely enough that fe's column of 
correlations will all be smaller in that same proportion. 
We can, moreover, find what these " saturations ' with g 
are. For on the theory, each of our six tests contains the 
factor g, and another part which has nothing to do with 
causing correlation. Moreover, the higher the test is in 
the hierarchical ranking, the more it is " saturated " with. 
Imagine now a fictitious test which had no specific, a test 
for g and for nothing else, whose saturation with g is 100 per 
cent., or 1-0. This fictitious test would, of course, stand 
at the head of the hierarchy, above our six real tests, and 
its row of correlations with each of those tests (their 
46 saturations ") would each be larger than any other in the 
same column. What values would these saturations take ? 

Before we answer this, let us direct our attention to the 
diagonal cells of the " matrix " of correlations (as it is 
called a matrix is just a square or oblong set of numbers), 
cells which we have up to the present left blank. Since 
each number in our matrix represents the correlation of the 
two tests in whose column and row it stands, there should 

* Para. 3 ; and see also Chapter XI, end of Section 2, page 175. 
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be inserted in each diagonal cell the number unity 9 repre- 
senting the correlation of a test with its own identical self. 
In these ^//correlations, however, the specific factor of 
each test, of course, plays its part. These self-correlations 
of unity are the only correlations in the whole table in 
which specifics do play any part. These " unities," there- 
fore, do not conform to the hierarchical rule of propor- 
tionality between the columns. 

But the case is different with the fictitious test of pure g. 
It has no specific, and its self-correlation of unity should 
conform to the hierarchy. If, therefore, we call the 
" saturations " of the other tests r lg9 r 2g9 r 3g9 r^ g9 r 5g9 and r 6g9 
we see that we must have, as we come down the first two 
columns within the matrix 

r lg _ -72 __ -63 -54 __ -45 -36 

and similar equations for each other column with the g 
column, which together indicate that the six " saturations " 

are ~~ -9 -8 -7 -6 -5 -4 

Furthermore, each correlation in the table is the product 
of two of these saturations. Thus 
72 = -9 X -8 

42 = -7 X -6 

**34 = T 3 ff X r lg 

The six tests can now be expressed in the form of 
equations: % = . 9 g + -436 5l 



800*4 
866s 5 
917s, 



F.A. 1 
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Herein, each z represents the score of some person in the 
test indicated by the subscript, a score made up of that 
person's g and specific in the proportions indicated by the 
coefficients. The scores are supposed measured from the 
average of all persons, being reckoned plus if above the 
average and minus if below ; and so too are the factors g 
and the specifics. And each of them, tests and factors, is 
" standardized," i.e. measured in such units that the sum 
of the squares of all the scores equals the number of 
persons. This is achieved by dividing the raw scores by the 
" standard deviation." The saturations of the specifics 
are such that the sum of the squares of both saturations 
comes in each test to unity, the whole variance of that test. 
Thus 

436 



5. A weighted battery. This brief outline of the Theory 
of Two Factors must for the moment suffice. It is 
enough to enable the question to be answered which at the 
end of our Section 2 led to the digression. " How," the 
reader asked, " do we know a pure factor, how are we to 
tell when the actual tests approximate to it ? " In the 
Two-factor Theory the important pure factor was g itself, 
and a test approximated to it the more, the higher it stood 
in the hierarchy. Its accuracy of measurement of g was 
indicated by its " saturation." And a battery of hier- 
archical tests could be weighted so as to have a combined 
saturation higher than that of any one member, each test 
for this purpose being weighted (as will be shown in Chapter 

' 

VII) by a number proportional to - ^ , where r^ is the 

I Tig 

g saturation of Test i (Abilities, p. xix). Although g 
remained a fiction, yet a complex test, made up of a 
weighted battery of tests which were hierarchical, could 
approach nearer and nearer to measuring it exactly, as 
more tests were added to the hierarchy. Each test added 
would have to conform to the rule of proportionality in its 
correlations with the pre-existing battery. If it did not 
do so it would have to be rejected. The battery at any 
stage would form a kind of definition of g, which it ap- 
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preached although never reached. And a man's weighted 
score in such a battery would be an estimate of his amount 
of g, his general intelligence. The factorial description of 
a man was at this period confined to one factor, since the 
specific factors were useless as description of any man. 
For one thing, they were innumerable ; and for another, 
being specific, they were only able to indicate how the man 
would perform in the very tests in which, as a matter of 
fact, we knew exactly how he had performed. 

6. Oval diagrams. It is convenient at this point to 
introduce a diagrammatic illustration which will be useful 
in the less technical part of this book, although like all 
illustrations it must be taken only 
as such, and the analogy must not 
be pushed too far. If we represent 
the two abilities, which are 
measured by tests, by two over- 
lapping ovals as in Figure 1, then 
the amount of the overlap can 
be made to represent the degree 
to which these tests are corre- 
lated. If we call the whole area 




Figure 1. 




of each oval the " variance " of 



Figure 2. 




that ability, we shall be intro- 
ducing the reader to another 
technical term (of which a de- 
finition was given in the footnote 
to page 5). Here it need mean 
nothing more than the whole 
66 amount " of the ability. The 
overlap we shall call the " co- 
variance." If the two variances Figure 3 
are each equal to unity, then 

the co variance is the correlation coefficient. To make 
the diagram quantitative, we can indicate in figures the 
contents of each part of the variance, as in the instance 
shown, which gives a correlation of ^, or -6. If the 
separate parts of each variance (i.e. of each oval) do not 
add up to the same quantity, but to v^ and v Z9 say, then 
the co variance (the amount in the overlap) must be 
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divided by V^i^ in order to give the correlation. Thus, 
Figure 2 represents a correlation of 3 -f- \/(4 X 9) = *5. 
No attempt is made in the diagrams to make the actual 
areas proportional to the parts of the variance, it is the 
numbers written in each cell which matter. 

The four abilities represented by four tests can clearly 
overlap in a complicated way, as in Figure 3, which shows 
one part of the variance (marked g) common to all four of 
the tests ; four parts (left unshaded) each common to three 
tests ; six parts (shaded) each common to two tests ; and 
four outer parts (marked s) each specific to one test only. 
The early Theory of Two Factors adopted the hypothesis 
that, except for very similar varieties of the one test, none 
of the cells of such a diagram had any contents save those 
marked g and s, the general and the specific factors. The 
" variance " of each ability was in that theory completely 
accounted for by the variance due to g, and the variance 
due to s. 

7. Tetrad-differences. In Section 3 it was explained that 
the discovery made by Professor Spearman was that the 
correlation coefficients in two columns tend to be in the 
same ratio as we go up and down the pair of columns. 
That is to say, if we take the columns belonging to Tests 
b and /, and fix our attention on the correlations which 
b and / make with d and e, we have : 



b f 


72 


45 


56 


35 



d 

e 

where -72 = --56 

45 ~~ ^35 
This may be written 

72 X -35 -45 X -56 = 

and in this form is called a " tetrad-difference." In 
symbols this one is 

V e/ - rjr* = 

Spearman's discovery may therefore be put thus : " The 
tetrad-differences are, or tend to be, zero." It is clear that 
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this will be so if, as we said was the case in the Theory of 
Two Factors, each correlation is the product of two cor- 
relations with g. For then the above tetrad-difference 
becomes 



which is identically zero. The present-day test for hier- 
archical order in a correlation matrix is to calculate all the 
tetrad -differences (always avoiding the main diagonal) and 
see if they are sufficiently small. If they are, then the 
correlations can be explained by a diagram of the same 
nature as Figure 3, by one general factor and specifics. It 
is, of course, not to be expected in actual experimenting 
that the tetrad-differences will be exactly zero ; no experi- 
ment on human material can be as accurate as that. What 
is required is that they shall be clustered round zero in a 
narrow curve, falling off steadily in frequency as zero is 
departed from. The number of tetrad-differences increases 
very rapidly as the number of tests grows, and in an actual 
experimental battery the tetrads arc very numerous indeed. 
In the small portion of a real correlation table given above 
(page 7), with six tests, there are 45 tetrad-differences,* 
and in this instance they are distributed as follows (taking 
absolute values only and disregarding signs, which can be 
changed by altering the order of the tests) : 

From -0000 to -0999, 28 tetrad-differences. 
From -1000 to -1999, 13 tetrad-differences. 
From -2000 to -2796, 4 tetrad-differences. 

This distribution of tetrads can be represented by a 
" histogram " like that shown in Figure 4, which explains 
itself. It is clear that some criterion is required by which 
we can know whether the distribution of tetrad-differences, 
after they have been calculated, is narrow enough to justify 
us in assuming the Theory of Two Factors. This criterion 
is explained in Part III of this book. One form of it consists 
in drawing a distribution curve to which, on grounds of 
sampling, the tetrad-differences may be expected to con- 
form. Any tetrad -differences which seem to be too large 

* Not all independent. 
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to be accounted for by the Theory of Two Factors are then 
examined, to see whether the tests giving them have any 

special points of resemblance, 
in content, method, or other- 
wise, which may explain why 
they disturb the hierarchy. 

8. Group factors. As time 
went on it became clear that 
the tendency to zero tetrad- 
differences, though strong, was 
not universal enough to permit 
an explanation of all correla- 
tions between tests in terms of 
g and specifics, with a few 
slight " disturbers " in the form of slightly overlapping 
specifics. It became necessary to call in group factors, 
which run through many though not through all tests, 
to explain the deviations from strict hierarchical order. 
The Spearman school of experimenters, however, tend 
always to explain as much as possible by one centra] 
factor, and to use group factors only when necessitated. 
They take the point of view that a group factor must as 
it were establish its right to existence, that the onus of 
proof is on him who asserts a group factor. As a tiny 
artificial illustration, a matrix of correlation coefficients : 



1 
2 
3 

4 



would be examined, and its three tetrad-differences found 
to be : 

zero 

-15 

15 

Inspection shows that the correlation r 23 is the cause of the 
discrepancies from zero, and the experimenter trained in 
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the Two-factor school would therefore explain these 
correlations by a central factor running through them all, 
plus a special link joining Tests 2 and 3, as in Figure 5. 

There are innumerable other possible ways of explaining 
these same correlations. For example, the linkages be- 
tween the tests might be as in Figure 6, which gives exactly 
the same correlations. This lack 
of uniqueness is something which 
must always be borne in mind 
in studying factorial analysis. 
There are always, as here, in- 
numerable possible analyses, and 
the final decision between them 
has to be made on some other 
grounds. The decision may be 
psychological, as when for ex- 
ample in the above case an 
experimenter chooses one of the 
possible diagrams because it best 
agrees with his psychological 
ideas about the tests. Or the 
decision may be made on the 
ground that we should be par- 
simonious in our invention of 
44 factors," and that where one 
general and one group factor will 
serve we should not invent five 
group factors as required by 
Figure 6. Both diagrams, however, fit the correlational 
facts exactly, and so also would hundreds of other diagrams 
which might be made. As has been said, the two- 
factor tendency is to take the diagram with the largest 
general factor (and the largest specifics also) and with as 
few group factors as possible. 

9. The verbal factor. In this way the Theory of Two 
Factors has gradually extended the " two " to include, in 
addition to g and specifics, a number of other group factors, 
still, however, comparatively few. These group factors bear 
such names as the verbal factor v, a mechanical factor m, 
an arithmetic factor, perseveration> etc. The charafc- 




Figure 6. 
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teristic method of the Two-factor school can be well 
seen, without any technical difficulties unduly obscuring 
the situation, in the search for a verbal factor. The idea 
that, in addition to a man's g (which is generally thought 
of as something innate) there may be an acquired factor 
of verbal facility which enables him to do well in certain 
tests, is a not unnatural one. A battery of tests can be 
assembled, of which half do, and half do not, employ words 
in their construction or solution. The correlation matrix 
will then have four quadrants, the quadrant V containing 
the correlations of the verbal tests am6ng themselves, the 




quadrant P the correlations of the non-verbal or, say, 
pictorial tests, and the quadrants C containing the cross- 
correlations of the one kind of test with the other. If the 
whole table is sufficiently " hierarchical," there is no 
evidence for a group factor v or a group factor p. If 
either of these factors exists, there will be differences to be 
noticed between the six kinds of tetrad which can be 
chosen, namely : 



P P 



(1) 



v p 

x . 
(4) 

x 



V V 



p p 




p 
p 



v p 

X 

.(5) 



p x x 

(3) 
p x x 

v p 



p 



X 



(6) 



X 



A tetrad like 1, with two verbal tests along one margin 
and two pictorial tests along the other, will be found in 
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quadrant C. Neither a factor common to the verbal tests 
only, nor one common to the pictorial tests only, will add 
anything to any of the four correlations in such a tetrad- 
difference, which may be expected, therefore, to tend to be 
zero. If the tetrads in C seem to do so, the other tetrads 
can be examined. Tetrad 2 is taken wholly from the V 
quadrant. In it the verbal factor, if any is present, will 
reinforce all the four correlations, and should not therefore 
disturb very much the tendency to a zero tetrad -difference. 
(Reinforced correlations are marked by x in the diagrams.) 
The same is true of Tetrad 3 taken wholly from the P 
quadrant. Tetrads 4 and 5 have each two of their cor- 
relations reinforced, by the v factor in 4 and by the p 
factor in 5, but in each case in such a way as not to change 
very much the tetrad -difference. It is when we come to 
tetrads like 6, which have one correlation in each of the 
four quadrants, that the presence of either or both factors 
should show itself strongly : for the two reinforced correla- 
tions here occur on a diagonal, and inflate only the one 
member of the tetrad-difference 

T T T T 

1 vv' pp ' vp' pv 

If, then, a verbal factor, and also a pictorial factor, are 
present, the tendency for the tetrad-differences to vanish 
should become less and less strong as we consider tetrads 
of the kinds 1, 2 and 3, 4 and 5, and especially 6, where 
the tetrad-differences should leap up. If only the verbal 
factor is present, tetrad-differences of the kind 3 should 
vanish rather more than those of the kind 2. But it will 
not be easy to distinguish between either suspected factor, 
and both. Tetrads like 6, however, should give conclusive 
evidence of the presence of one or the other, if not both. 
Methods like this were employed by Miss Davey (Davey, 
1926), who found a group factor, but not one running 
through all the verbal tests, and by Dr. Stephenson 
(Stephenson, 1931), whose results indicated the presence 
of a verbal factor.* 

10. Group-factor saturations. Just as the g saturations 

* T. L. Kelley had already found by other methods strong evidence 
of a verbal factor (Kelley, 1928, 104, 121 et passim). 



18 THE FACTORIAL ANALYSIS OF HUMAN ABILITY 

of tests can be calculated, so also can the saturation of a 
test with any group factor it may contain. The general 
method of the Two-factor school is first to work with 
batteries of tests which give no unduly large tetrad- 
differences, and which also appear to satisfy one's general 
impression that they test intelligence. From such a 
battery, of which the best example is that of Brown and 
Stephenson (B. and S., 1933), the g saturations can be 
calculated.* Each test has, however, also its specific, which, 
so long as it is in the hierarchical battery, is unique to it and 
shared with no other member of the battery. A test may 
now be associated with some other battery of different 
tests, and with some of these it may share a part of its 
former specific, as a group factor which will increase its 
correlation beyond that caused by g. The excess correla- 
tion enables the saturation of the test with this group 
factor to be found the details are too technical for this 
chapter and the specific saturation correspondingly 
reduced. Finally, the tester may be able to give the 
composition of a test as, let us say (to invent an example) 

Tig + -40t; + -34n + -47s 

where g is Spearman's g, v is Stephenson's verbal factor, 
n is a number factor, and s is the remaining specific of the 
test. The coefficients are the " saturations " of the test 
with each of these ; that is, the correlations believed to exist 
between the test and these fictitious tests called factors. 
The squares of these saturations represent the fractions of 
the test-variance contributed by each factor, and these 
squares sum to unity, thus : 

Saturation Squared 
g -5041 

v -1600 

n -1156 

s . . . -2209 



1-0006 

* For the sake of clarity the text here rather oversimplifies the 
situation. The battery of Brown and Stephenson contains In fact 
a rather large group factor as Well as g and specifics. 
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11. The bif actor method. Holzinger's Bif actor Method 
(Holzinger, 1935, 1937a) may be looked upon as another 
natural extension of the simple Two-factor plan of analysis. 
It endeavours to analyse a battery of tests into one general 
factor and a number of mutually exclusive group factors. 
A diagram of such an analysis looks like a " hollow stair- 
case," thus : 
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Here factor g runs through all, as is indicated by the 
column of crosses. Factors h, k, and I run through mutu- 
ally exclusive groups of tests each. The saturations with 
g can be calculated from sub-batteries of tests which form 
perfect hierarchies, by selecting only one test from each 
group (in every possible way). After these are known, 
the correlation due to g can be removed, and then the 
saturations due to each group factor found.* 

12. Vocational guidance. It will clearly be an aim of the 
experimenter along all these lines to obtain if possible 
single real tests, or failing that weighted batteries of tests, 
which approximate as closely as possible to the factors he 
has found, or postulated ; and with these to estimate the 
amount of each factor possessed by any man, and also (by 
giving such tests to tried workmen or school pupils) to 
estimate the amount of each factor required by different 
" occupations " (including higher education) with a view to 
vocational and educational selection and guidance. 

* See also the Addendum, page 348* 



CHAPTER II 

MULTIPLE-FACTOR ANALYSIS 

1. Need of group factors. The two-factor method of 
analysis, described in the last chapter, began with the idea 
that a matrix of correlations would ordinarily show perfect 
hierarchical order if care was taken to avoid tests which 
were " unduly similar," i.e. very similar indeed to one 
another. If such were found coexisting in the team of 
tests, the team had to be " purified " by the rejection of 
one or other of the two. Later it became clear that this 
process involves the experimenter in great difficulty, for it 
subjects him to the temptation to discover " undue simi- 
larity " between tests after he has found that their correla- 
tion breaks the hierarchy. Moreover, whole groups of 
tests were found to fail to conform ; and so group factors 
were admitted, though always, by the experimenter trained 
in that school, with reluctance and in as small a number as 
possible. It had, however, become quite clear that the 
Theory of Two Factors in its original form had been super- 
seded by a theory of many factors, although the method 
of two factors remained as an analytical device for 
indicating their presence and for isolating them in com- 
parative purity. 

Under these circumstances it is not surprising that some 
workers turned their attention to the possibility of a method 
of multiple-factor analysis, by which any matrix of test 
correlations could be analysed direct into its factors 
(Garnett, 1919a and b). It was Professor Thurstone of 
Chicago who saw that one solution to this problem could 
be reached by a generalization of Spearman's idea of zero 
tetrad -differences. 

2. Rank of a matrix and number of factors. We saw that 
when all the tetrad-differences are zero, the correlations 
can all be explained by one general factor, a tetrad being 

20 
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formed of the intereorrelations of two tests with two other 
tests, thus : 

3 4 



1 

2 



and the tetrad-difference being 



Thurstone's idea, though rather differently expressed by 
him ( Vectors, Chapter II), can be based on a second, third, 
fourth . . . calculation of certain tetrad-differences of 
tetrad-differences . 

To explain this, let us consider the correlation co- 
efficients which three tests make with three others : 
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7*24 


7*25 
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7*34 


7-35 


7*36 



This arrangement of nine correlation coefficients might 
have been called a " nonad," by analogy with the tetrad, 
Actually, by mathematicians, it is called a " minor deter- 
minant of order three " or more briefly a three-rowed 
minor ; a tetrad is in this nomenclature a " minor of order 
two." 

We can now, on the above three-rowed determinant, 
perform the following calculation. Choose the top left 
coefficient as " pivot," and calculate the four tetrad- 
differences of which it forms part, namely : 



These four tetrad-differences now themselves form a 
tetrad which can be evaluated. If it is zero, we say that 
the three-rowed determinant with which we started 
" vanishes." 

Exactly the same repeated process can be carried on with 
larger minor determinants. For example, the minor of 
order four here shown vanishes : 
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34 

72 
46 
60 



(26) 
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62 


44 
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66 


45 


58 


63 




( -0408) 


0016 


pivotal 


0204 


0044 



0444 
-0300 
t.d.'s are 



( -00021216) -00031824 

and then -00028288 -00042432 



and finally zero 

This process of continually calculating tetrads is called 
" pivotal condensation." The reader should be given a 
word of warning here, that the end result of this form of 
calculation, if not zero, has to be divided by the product of 
certain powers of the pivots, to give the value of the deter- 
minant we began with. A routine method (Aitken, 1937a) 
of carrying out pivotal condensation, including division 
by the pivot at each step, is described in Chapter VI, 
pages 89 ff.* 

We can in this way examine the minors of orders two, 
three, four (and so on) of a correlation matrix, always 
avoiding those diagonal cells which correspond to the 
correlation of a test with itself. We may come to a point 
at which all the minors of that order vanish. Suppose these 
minors which all vanish are the minors of order five. We 
then say that the " rank " of the correlation matrix is four 
(with the exception of the diagonal cells). There then 
exists the possibility that the " rank " of the whole corre- 
lation matrix can be reduced to four by inserting suitable 
quantities in the diagonal cells (see next section). The 
" rank " of a matrix is the order of its largest f non-vanish- 

* If the process gives, at an earlier stage than the end, a matrix 
entirely composed of zeros, the rank of the original determinant is 
correspondingly less, being equal to the number of condensations 
needed to give zeros. 

t " Largest " refers to the number of rows, not to the numerical 
value. 
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ing minor. Thurstone's discovery was that the tests could 
be analysed into as many common factors as the above 
reduced rank of their correlation matrix the rank, that 
is to say, apart from the diagonal cells plus a specific in 
each test. He also invented a method of performing the 
analysis. 

3. Thurstone's method used on a hierarchy. Thurstone's 
rule 'about the rank includes Spearman's hierarchy as a 
special case, for in a hierarchy the tetrads that is, the 
minors of order two vanish. The rank is therefore one, 
and a hierarchical set of tests can be analysed into one 
common factor plus a specific in each. A simple way of 
introducing the reader to Thurstone's hypothesis and also 
to his " centroid " method * of finding a set of factor satura- 
tions will be to use it first of all on the perfect Spearman 
hierarchy which we cited as an artificial example in our 
first chapter. 

Tests 
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54 -48 

45 -40 

36 32 



63 


54 


45 


36 


56 


48 


40 


32 


. 


42 


35 


28 


42 


. 


30 


24 


35 


30 


, 


20 


28 


24 


20 


, 



The first step in Thurstone's method, after the rank has 
been found, is to place in the blank diagonal cells numbers 
which will cause these cells also to partake of the same rank 
as the rest of the matrix, numbers which, for a reason which 
will become clear later, are called " communalities." In 
our present Spearman example that rank is one 9 i.e. the 
tetrads vanish. The communalities, therefore, must be 
such numbers as will make also those tetrads vanish which 
include a diagonal cell : this enables them to be calculated. 
Let us, for example, fix our attention on the communality 
of the first test, which we will designate h^ (the reason for 
the " square " will become apparent later). Then the 
tetrad formed by Tests 1 and 2 with Tests 1 and 3 is : 

* We shall see why it is called the " centroid " method in Section 
9 of Chapter VI, after we have learned to use a " pooling square." 
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1 3 



1 
2 



V -63 
72 -56 



and the tetrad-difference has to vanish. Therefore 

56V - -72 X -63 = 
.-.hS = -81 

Similarly all the communalities can be calculated, and 
found to be 

81 -64 -49 -36 -25 -16 

(The observant reader will notice that they are the squares 
of the " saturations " of our first chapter ; but let us con- 
tinue with Thurstone's method as though we had not 
noticed this.) 

Thurstone's method of finding the saturations of each 
test with the first common factor is then to insert the com- 
munalities in the diagonal cells and add up the columns * 
of the matrix, thus : 

Original Correlation Matrix 



(.81) 


72 


63 


54 


45 


36 


72 


(64) 


56 


48 


40 


32 


63 


56 


(-49) 


42 


35 


28 


54 


48 


42 


(36) 


30 


24 


45 


40 


35 


30 


(25) 


20 


36 


32 


28 


24 


20 


(16) 



3-51 3-12 2-73 2-34 1-95 1-56 15-21 

The column totals are then themselves added together 
(15-21) and the square root taken (3-90). The " satura- 

* This, the " centroid " method of finding a set of loadings, is not in 
any way bound up with Thurstone's theorem about the rank and 
the number of common factors. It can be used, for example, with 
unity in each diagonal cell, in which case it will give as many factors 
as there are tests and saturations somewhat resembling those given 
by Hotelling's process described in Chapter V : and vice versa Hotel- 
ling's process could be used on the matrix with communalities 
inserted. 
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tions " of the first (and here the only) common factor 
are then the columnar totals divided by this square root, 
namely 

3-51 3-12 2-73 2-34 1-95 1-56 



3-90 3-90 3-90 3-90 3-90 3-90 
or -9 '8 -7 -6 -5 -4 

as in the present instance we already know them to be. 
(Very often in multiple-factor analysis the " saturation " 
of a test with a factor is called the " loading," and this is 
a convenient place to introduce the new term.) 

As applied to the hierarchical case, this method of 
finding the saturations or loadings had been devised and 
employed many years previously by Cyril Burt, though it 
is not quite clear how he would have filled in the blank 
diagonal cells (Burt, 1917, 53, footnote, and 1940, 448, 462). 
It should be explained that in actual practice Thurstone 
and his followers do not calculate the minor determinants 
to find the rank and the communality, for that would be 
too laborious. Instead they adopt the approximation of 
inserting in each diagonal cell the largest correlation 
coefficient of the column (see Chapter X). 

4. The second stage of the " centroid " method. If there is 
more than one common factor, the process goes on to 
another stage. Even with our example we can show the 
beginning of this second stage, which consists in forming 
that matrix of correlations which the first factor alone 
would produce. This is done by writing the loadings 
along the two sides of a chequer board and filling every cell 
of the chequer board with the product of the loading of 
that row with the loading of that column, thus : 

First-factor Matrix 





9 


8 


7 


6 


5 


4 


9 


81 


72 


63 


54 


45 


36 


8 


72 


64 


56 


48 


40 


32 


-7 


63 


56 


49 


42 


35 


28 


6 


54 


48 


42 


36 


30 


24 


5 


45 


40 


35 


30 


25 


20 


4 / 


36 


32 


28 


24 


20 


16 
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This is the " first-factor matrix," which gives the parts of 
the correlations due to the first factor. This matrix has now 
to be subtracted from the original matrix to find the resi- 
dues which must be explained by further common factors. 
In our present example the first-factor matrix is identical 
with the original matrix and the residues are all zero. Only 
the one common factor is therefore required. (Of course, 
the reader will understand that in a real experimental 
matrix the residues can never be expected to be exactly 
zero : one is content when they are near enough to zero to 
be due to chance experimental error.) Had the rank of 
our original matrix of correlations been, however, higher 
than one, there would have been a matrix of residues. 

Let us now make an artificial example with a larger 
number of common factors, say three, which we can after- 
wards use to illustrate the further stages of Thurstone's 
method. We can do this in an illuminating manner by 
the aid of the oval diagrams described in Chapter I. 

5. A three-factor example. In Figure 7, a diagram of the 
overlapping variances of four tests, let us insert three 

common factors and specifics to 
complete the variance of each 
test to 10 (to make our arithmeti- 
cal work easy). No factor here 
is common to all the four tests. 
The factor with a variance of 
4 runs through Tests 1, 2, and 3. 
That with a variance 3 runs 
through Tests 2, 3, and 4. That 
with a variance 2 runs through 
Tests 1 and 4. The other factors 
are specifics. The four test variances being each 10, the 
correlation coefficients are written down from the overlaps 
by inspection as : 




Figure 7. 





1 


2 


3 


4 


1 


(-6) 


4 


4 


2 


2 


4 


(-7) 


7 


3 


3 


4 


7 


(7) 


3 


4 -2 


3 


3 


(-5) 
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Moreover, we can put into our matrix the communalities 
corresponding to our diagram. Each communality is, in 
fact, that fraction of the variance of a test which is not 
specific. Thus '6 of the variance of Test 1 is " communal," 
4 being specific or "selfish." In this way we have the 
matrix above, with communalities inserted. We can now 
pretend that it is an experimental matrix, ready for the 
application of Thurstone's method, as follows : 



(-6) 


4 


4 


2 




4 


(?) 


7 


3 


Original 


4 


7 


(-7) 


3 


experimental 


2 


3 


3 


(5) 


matrix. 


1-6 


2-1 


2-1 


1-3 


= 7-1 = 2-6646 2 


6005 


7881 


7881 


4879 


= 2-6646* 


(3606) 


4733 


4733 


2930 




4733 


(6211) 


6211 


3845 


First-factor 


4733 


6211 


(6211) 


3845 


matrix. 


2930 


3845 


3845 


(2380) 





1st Loadings 

6005 

7881 
7881 
4879 

Here it is seen that the loadings of the first factor, when 
cross-multiplied in a chequer board, give a firsts-factor 
matrix which is not identical with the original experimental 
matrix, unlike the case of the former, hierarchical, matrix. 
Here (as we who made the matrix know) one factor will 
not suffice. We subtract the first-factor matrix from the 
original experimental matrix to see how much of the 
correlations still has to be explained, and how much of the 
" communalities " or communal variances. The latter 



were- 



6 



and of these amounts the first factor has explained 
3606 -6211 -6211 -2880 

If we subtract the first-factor matrix, element by element, 
from the original experimental matrix, we get the residual 
matrix : 

* This check should always be applied. To avoid complication 
it is not printed in the later tables. It applies to the loadings with 
their temporary signs (see below). 
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(-2394) -0733 - -0733 -0930 

- -0733 (-0789) -0789 -0845 First residual 

- -0733 -0789 (-0789) - -0845 matrix. 

- -0930 - -0845 - -0845 (-2620) 

To this matrix we are now going to apply exactly the same 
procedure as we applied to the original experimental 
matrix, in order to find the loadings of the second factor. 
But we meet at once with a difficulty. The columns of the 
residual matrix add up exactly * to zero ! This always 
happens, and is indeed a useful check on our arithmetical 
work up to this point, but it seems to stop our further 
progress. 

To get over this difficulty we change temporarily the signs 
of some of the tests in order to make a majority of the cells 
of each column of the matrix positive. The practice 
adopted by Thurstone in The Vectors of Mind is to change 
the sign of the test with most minuses in its column and 
row, and so on until there is a large majority of plus signs. 
This is the best plan. Copy the signs on a separate paper, 
omitting the diagonal signs, which never change. Since 
some signs will change twice or thrice, use the convention 
that a plus surrounded by a ring means minus, and if then 
covered by an X means plus again. Near the end, watch 
the actual numbers, for the minus signs in a column may 
be very small. The object is to make the grand total 
a maximum, and thus take out maximum variance with 
each factor. We shall here, however, for simplicity adopt 
his easier rule given in A Simplified Factor Method, i.e. 
to seek out the column whose total regardless of signs is 
the largest, and then temporarily change the signs of 
variables so as to make all the signs in that column positive. 

The sums of the above columns, regardless of sign, are 

4790 -3156 -3156 -5240 

and therefore we must change the signs of tests so as to 
make all the signs in Column 4 positive ; that is, we must 
change the signs of the first three tests .f Since we change 

* When enough decimals have been retained. In practice there 
may be a discrepancy in the last decimal place. 

f Changing the sign of Test 4 would here have the same result, 
but for uniformity of routine we stick to the letter of the rule. 
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the three row signs, as well as the three column signs, this 
will leave a block of signs unchanged, but will make the 
last column and the last row all positive. We now have : 



2394 


- -0733 


- -0733 


(-)-0930 




- -0733 


0789 


0789 


(-)-0845 


First residual 


-0733 


0789 


0789 


(-)-0845 


matrix with 


(-)-0930 


(-)-0845 


(-)-0845 


2620 


changed signs. 


1858 


1690 


1690 


5240 


= 1-0478 










= 1-0236 2 


2nd 


1815 


1651 


1651 


5119 


With temporary 


Loadings 










signs. 


1815 


0329 


0300 


0300 


0929 




1651 


0300 


0273 


0273 


0845 


Second-factor 


1651 


0300 


0273 


0273 


0845 


matrix. 


5119 


0929 


0845 


0845 


2620 




2065 


-1033 


-1033 


0001 




- -1033 


0516 


0516 




Second residual 


- -1033 


0516 


0516 


, 


matrix. 


0001 


. 


. 







On the matrix with these temporarily changed signs we 
then operate exactly as we did on the original experimental 
matrix, and obtain second-factor loadings which (with 
temporary signs) are 



1815 



1651 



1651 



5119 



The second-factor matrix, that is, the matrix showing 
how much correlation is due to the second factor, is then 
made on a chequer board still using the temporary signs, 
and subtracted from the previous matrix of residues (with 
its temporary signs, not with its first signs) to find the 
residues still remaining, to be explained by further factors. 
In the present instance we see that the whole variance of 
the fourth test entirely disappears, and also all the correla- 
tions in which that test is concerned.* This test, therefore, 
is fully explained by the two factors already extracted . 
Only the first three test variances remain unexhausted, 
and their correlations. Again the columns of the residual 

* When enough decimals are retained. We shall treat the 
0001 as zero. 
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matrix sum exactly to zero. Following our rule, the signs 
of Tests 2 and 3 have to be temporarily changed before 
the process can continue. After these changes of sign the 
second residual matrix is as follows, and the same operation 
as before is again performed on it : 

2065 (-)-1033 (-)-1033 . Second residual 

( )-1033 -0516 -0516 . matrix with signs 

(-)-1033 -0516 -0516 . temporarily 

. changed. 

4131 -2065 -2065 . = -8261 = -9089 2 

&rd Loadings -4545 -2272 -2272 . with temporary 

signs. 

With these third-factor loadings we can now calculate the 
variances and correlations due to the third factor : and we 
find these are exactly equal to the second residual matrix. 
On subtracting, the third residual matrix we obtain is 
entirely composed of zeros. (In a practical example we 
should be content if it was sufficiently small.) We thus 
find (as our construction of the artificial tests entitled us to 
expect) that the matrix of correlations can be completely 
explained by three common factors. 

After the analysis has been completed, some care is 
needed in returning from the temporary signs of the load- 
ings to the correct signs. The only safe plan is to write 
down first of all the loadings with their temporary signs 
as they came out in the analysis. In our present example 
these happen to be all positive, though that will not 
always occur. 

Loadings with Temporary Signs 



Test 



1 
2 
3 

4 



/ // III 



6005 -1815 -4545 

7881 -1651 -2272 

7881 -1651 -2272 

4879 -5119 



Now, in obtaining Loadings II the signs of Tests 1, 2, and 
3 were changed. We must, therefore, in the above table 
reverse the signs of the loadings of these three tests in 
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Column II and each later column. Then in obtaining 
Loadings III the signs of Test 2 and 3 were changed ; that 
is, in our case changed back to positive. The loadings 
with their proper signs are therefore as shown in the first 
three columns of this table : 







Loadings of the Factors 


(Signs Replaced) 




Test 












/ 


II III 


Specific 




1 


6005 


-1815 -4545 


6324 


. 


2 


7881 


-1651 + -2272 


5477 


. 


3 


7881 


-1651 + -2272 


-5477 


, 


4 


4879 


5119 


. 


7071 



In this table each column of loadings, for the common 
factors after the first, adds up to zero. The loading of the 
specific is found from the fact that in each row the sum of 
the squares must be unity, being the whole variance of the 
test. The inner product * of each pair of rows gives the 
correlation between those two tests (Garnett, 1919a). 
Thus 

r 12 = -6005 X -7881 + -1815 X -1651 - -4545 X -2272 = -4000 

in agreement with the entry in the original correlation 
matrix. With artificial data like the present, the analysis 
results in loadings which give the correlations back exactly. 

It will be seen that all the signs in any column of the 
table of loadings can be reversed without making any 
change in the inner products of the rows ; that is, without 
altering the correlations. We would usually prefer, there- 
fore, to reverse the signs of a column like our Column III, 
so as to make its largest member positive. 

The amount which each factor contributes to the variance 
of the test is indicated by the square of its loading in that 

* By the " inner product " of two series of numbers is meant the 
sum of their products in pairs. Thus the inner product of the two 
sets: 

abed 

and A B C D 

is aA -f bB + cC + dD 
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test. The sum of the squares of the three common-factor 
loadings gives the " communality " which we originally 
deduced from Figure 7 and inserted in the diagonal cells of 
our original correlation matrix. These facts can be better 
seen if we make a table of the squares of the above loadings : 





Variance contributed by Each Factor 


Test 
















I 


II 


III 


Communality 


Specific 
Variance 


Total 


1 


3606 


0329 


2065 


6000 


4000 


1 


2 


6211 


0273 


0516 


7000 


3000 


1 


3 


6211 


0273 


0516 


7000 


3000 


1 


4 


2380 


2620 





5000 


5000 


1 


Total 


1-8408 


3495 


3097 


2-5000 


1 -5000 


4 



6. Comparison of the analysis with the diagram. The 
reader has probably been turning from this calculation of 
the factor loadings back to the four-oval diagram with 
which we started (page 26), to detect any connection ; and 
has been disappointed to find none. The fact is that the 
analysis to which the Thurstone method has led us is, 
except that it too has three common factors, a different 
analysis from that which the original diagram naturally 
invites. That diagram gave for the variance due to each 
factor the following : 



Variance contributed by Each Factor 



Test 


/ 


II 


III 


Communality 


Specific 
Variance 


Total 


1 


4 




2 


6 


4 


1 


2 


4 


3 


. 


7 


3 


1 


3 


4 


3 


. 


7 


3 


1 


4 





3 


2 


5 


5 


1 


Totals 


1-2 


9 


4 


2-5 


1-5 


4 
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and the factor loadings are the positive square roots of 
these. 



Loadings of the Factors 
Test 



I 
2 
3 



/ 


II III 


6325 


4472 



Specifics 



6324 

6325 -5477 . . -5477 

'6325 -5477 . . . -5477 



4 i . -5477 -4472 . . . -7071 



The only points in common between the two analyses are 
that they both have the same communalities (and therefore 
the same specific variances) and the same number of com- 
mon factors. The Thurstone analysis has two general 
factors (running through all four tests), while the diagram 
had none : and the Thurstone analysis has several negative 
loadings, while the diagram had none. We shall see later 
that Thurstone, after arriving at this first analysis, en- 
deavours to convert it into an analysis more like that of 
our diagram, with no negative loadings and no completely 
general factors. This is one of the most difficult yet 
essential parts of his method. 

7. Analysis into two common factors. When we began 
our analysis of the matrix of correlations corresponding to 
Figure 7, we simply put the communalities suggested by 
that figure into the blank diagonal cells. That served to 
illustrate the fact that the Thurstone method of calculation 
will bring out as many factors as correspond to the com- 
munalities used, here three factors. But it disregarded 
(intentionally for the purpose of the above illustration) a 
cardinal point of Thurstone 's theory that we must seek 
for the communalities which make the rank of the matrix a 
minimum, and therefore the number of common factors a 
minimum. We simply accepted the communalities sug- 
gested by the diagram. Let us now repair our omission 
and see if there is not a possible analysis of these tests into 
fewer than three common factors. There is no hope "of 
reducing the rank to one, for the original correlations give 
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two of the three tetrads different from zero, and we may 
(in an artificial example) assume that there are no experi- 
mental or other errors. But there is nothing in the experi- 
mental correlations to make it certain that rank 2 
cannot be attained. With only four tests (far too few, be 
it remembered, for an actual experiment) there is no minor 
of order three entirely composed of experimentally obtained 
correlations. It may then be the case that communalities 
can be found which reduce the rank to 2. Indeed, as we 
shall see presently, many sets of communalities will do so, 
of which one is shown here : 

(26) -4 -4 -2 

4 (-7) -7 -3 

4 -7 (-7) -3 

2 -3 -3 (-15) 

These communalities '26, *7, -7, and -15 make every 
three-rowed minor exactly zero. For example, the minor 

(26) -4 -2 

4 . (-7) -3 

-2 -3 (-15) 

becomes by " pivotal condensation " : 

026 





and finally 

It must, therefore, be possible to make a four-oval 
diagram, showing only two common factors, and indeed 




3. 



4. 

Figure 8. 

more than one such diagram can be found. One is shown 
in Figure 8. 
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This gives exactly the correct correlations. For ex- 
ample 



r 23 = 



12+2 



12 



= ^ = 7 
20 



* o 



80) 40 

It also gives the communalities *26, -7, -7, 15. For 
example, in Test 1, variance to the amount of 12 out of 
45 is communal, and 12/45 = -26. 

The insertion of these communalities, therefore, in the 
matrix of correlations ought to give a matrix which only 
two applications of Thurstone's calculation should com- 
pletely exhaust. The reader is advised to carry out the 
calculation as an exercise. He will find for the first-factor 
loadings 

5000 -8290 -8290 -3750 

and if in the first residual matrix, following our rule, he 
changes temporarily the signs of Tests 2 and 3, the second- 
factor loadings will be 

1291 -1128 -1128 -0968 

The second residual matrix will be found to be exactly 
zero in each of its sixteen cells. The variance (square of 
the loading) contributed by each factor to each test is then 
in this analysis : 



Test 
1 


Variance contributed by Each Factor 


I 


II 


Communality 


Specific 
Variance 


Total 


2500 


0167 


2667, 


7333 


1 


2 


6873 


0127 


7000 


3000 


1 


3 


6873 


0127 


7000 


3000 


1 


4 


1406 


0094 


1500 


8500 


1 


Totals 


1 -7652 


0515 


1-8167 


2-1833 


4 



If we now compare these analyses, we see that the three 
common factors of the previous analysis " took out," as 
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the factorial worker says, a variance of 2*5 of the total 4, 
leaving 1*5 for the specifics. The present analysis leaves 
2-1833 for the specifics, which here form a larger part of 
the four tests. 

8. Alexander's rotation. We saw in Section 6 that the 
Thurstone method there led to an analysis which was 
different from the analysis corresponding to the diagram 
with which we began. That is also the case with the 
present analysis into two common factors the very fact 
that it gives the second factor two negative loadings shows 
this, for the diagram (Figure 8) corresponds to positive 
loadings only. We said, too, in Section 6 that a difficult 
part of Thurstone's method was the conversion of the 
loadings into new and equivalent loadings which are all 
positive. This will form the subject of a later and more 
technical chapter ; but a simple illustration of one method 
of conversion (or " rotation " as it is called, for a reason 
which will become clear later) can be given from our present 
example. It is a method which can be used only if we have 
reason to think that one of our tests contains only one 
common factor (Alexander, 1935, 144). Let us suppose in 
our present case that from other sources we know this fact 
about Test 1. The centroid analysis has given us the 
loadings shown in the first two columns of this table : 



Test 


Unrelated 
Loadings 


Communality 


Rotated 
Loadings 


Rotated 
Loadings 


I II 


I* II* 


/** //** 


1 
2 
3 
4 


5000 -1291 
8290 -1128 
8290 -1128 
3750 -0968 


2667 
7000 
7000 
1500 


5164 
7746 -3162 
7746 -3162 
3873 


4781 -1952 
8367 
8367 
3586 -1464 



The communalities are also shown ; they are the sums of 
the squares of the loadings. If now we know or decide to 
assume that Test 1 has really only one common factor, and 
if we want to preserve the communalities shown, then the 
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loading of factor I* in Test 1 must be the square root of 
2667, namely -5164. 

The loadings of factor I* in the other three tests can 
now be found from the fact that they must give the corre- 
lations of those tests with Test 1, since Test 1 has no 
second factor to contribute. The loadings shown in 
column I* are found in this way : for example, -7746 is 
the quotient of -5164 divided into r ia (-4), and -3873 is 
similarly r u (-2) divided by *5164. 

The contributions of factor I* to the communalities are 
obtained by squaring these loadings. In Test 1, we 
already know that factor I* exhausts the communality, for 
that is how we found its loading. We discover that in 
Test 4, factor I* likewise exhausts the communality, for 
the square of 3873 is -1500. The other two tests, however, 
have each an amount of communality remaining equal to 
1000 (i.e. -7000 -7746 a ). The square root of -1000, 
therefore (*3162), must be the loading of factor II* in 
Tests 2 and 3. The double column of loadings ought now 
to give all the correlations of the original correlation 
matrix, and we find that it does so. Thus, e.g. 

r 23 = -7746 X -7746 + -3162 X -3162 = -7000 
and r 24 = -7746 X -3873 == -3000 

Moreover, the analysis into factors I* and II* corre- 
sponds exactly to Figure 8. For example, the loading of 
factor II* in Test 2 in that diagram is the square root of 
2/20 (-3162) ; and the loading of factor I* in Test 4 is the 
square root of 12/80 (-3873). 

If, however, the experimenter had reasons for thinking 
that Test 2 (not Test 1) was free from the second common 
factor, his " rotation " of the loadings would have given a 
different result, shown in the table opposite in columns 
I** and II**. This set of loadings also gives the correct 
communalities and the experimental correlations, but does 
not correspond to Figure 8. A diagram can, however, be 
constructed to agree with it (Figure 9), and the reader is 
advised to check the agreement by calculating from the 
diagram the loadings of each factor, the communalities of 
each test, and the correlations. 
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We have had, in Figures 7, 8, and 9, three different 
analyses of the same matrix of correlations. If with 

Thurstone we decide that analyses 
must always use the minimal 
number of common factors, we 
will reject Figure 7. Between 
Figures 8 and 9, however, this 
principle makes no choice. Much 
of the later and more technical 
part of Thurstone 's method is 
taken up with his endeavours to 
lay down conditions which will 
make the analysis unique. 
9. Unique communalities. The first requirement for a 
unique analysis is that the set of communalities which gives 
the lowest rank should be unique, and this is not the case 
with a battery of only four tests and minimal rank 2, like 
our example. There are many different sets of com- 
munalities, all of which reduce the matrix of correlations 
of our four tests to rank 2. If, for example, we fix the 
first communality arbitrarily, say at -5, we can condense 
the determinant to one of order 3 by using -5 as a pivot 
(as on page 22) except that the diagonal of the smaller 
matrix will be blank : 




Figure 9. 



(-5) 
-4 
4 
2 



4 

7 
3 



19 
07 



4 

7 



19 



07 



2 
3 
3 

07 
07 



We can then fill the diagonal of the smaller matrix with 
numbers which will make each of its tetrads zero, namely 

19 -19 -0258 

and then, working back to the original matrix, find the 
communalities 

5 -7 -7 -1316 
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which make its rank exactly 2. We can .similarly insert 
different numbers for the first communality and calculate 
different sets of communalities, any one set of which will 
reduce the rank to 2. In this way we can go from 1*0 
down to 0*22951 for the first communality without obtain- 
ing inadmissible magnitudes for the others. Some sets 
are given in the following table * : 



1 


2 


3 


4 


Sum 


1-0 


7 


7 


12963 


2-52963 


7 


7 


7 


13030 


2-23030 


5 


7 


7 


13158 


2-03158 


3 


7 


7 


14 


1-84 


26 


7 


7 


15 


1-816 


256 


7 


7 


1583 


1-8143 


25 


7 


7 


16 


1-816 


24 


7 


7 


20 


1-84 


23 


7 


7 


7 


2-33 


22951 


7 


7 


1-0 


2-62951* 



If, however, we search for and find a fifth test to add to 
the four, which will still permit the rank to be reduced to 
2, this fifth test will fix the communalities at some point 
or other within the above range. Suppose that this test 
gave the correlations shown in the last row and column : 



1 


. 


4 


4 


2 


4 


. 


7 


3 


4 


7 


. 


4 


2 


3 


3 


5 


5883 


2852 


2852 



2 
3 
3 

1480 



5883 
2852 
2852 
1480 



If we now try to find communalities to reduce this 
matrix to rank 2 (as can be done), we find only the one 
set 

7 -7 -7 -13030 -5 

The reader can try this by assigning an arbitrary value for 

* The circumstance that the communalities of Tests 2 and 3 
remain fixed and alike is due to these tests being identical except for 
their specific. This lightens the arithmetic, but would not occur 
in practice. 
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the first one,* and then condensing the matrix on the lines 
employed above, when he will always find some obstacle 
in the way unless he chooses -7. Try, for example, -5 for 
the first communality : 



(5) 


: 4 


4 


2 


5883 


4 


. 


7 


3 


2852 


4 


7 




3 


2852 


2 


3 


3 


. 


1480 


5883 


2852 


2852 


1480 


- 




(X) 


19 


07 


-09272 




19 




07 


-09272 




07 


07 


. 


- -04366 




-09272 


-09272 


-04366 


e 



Now, if the upper matrix is to be of rank 2, the 
second condensation must give only zeros (see footnote, 
page 22). But if we fix our attention on different tetrads 
in the lower matrix which contain the pivot x 9 we see that 
they give, if they have to be zero, incompatible values for 
x. Thus from one tetrad we get x = -19, from another 
x = -14866. With 5 as first communality, rank 2 
cannot be attained. With five tests (or more), if rank 2 
can be attained at all, it can only be by one unique set of 
communalities. Just as it took three tests to enable the 
saturations with Spearman's g to be calculated, so it takes 
five tests to enable communalities due to two common 
factors to be calculated. For larger numbers of common 
factors, the number of tests required to make the set of 
communalities unique is shown in the following table 
(Vectors, 77). The lower numbers are given by the 
formula 

(2r + lH 
^ 



r Factors 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 12 


n Tests 


3 


5 


6 


8 


9 


10 


12 


13 


14 


15 


17 18 



* Alternatively, the communalities (which are now unique) can 
be found by equating to zero those three-rowed minors which have 
only one element in common with the diagonal (Vectors, 86). In 
this connection see Ledermann, 1937. 
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If we were actually confronted with the matrix of correla- 
tions shown on page 39, and asked what the communalities 
were which reduced it to the lowest possible rank, we would 
find it very unsatisfactory to have to guess at random and 
try each set ; and our embarrassment would be still greater 
if there were more tests in the battery, as would actually be 
the case in practice. There would also be sampling error 
(which in this our preliminary description of Thurstone's 
method we are assuming to be non-existent). Under these 
circumstances, devices for arriving rapidly at approximate 
values of the communalities are very desirable. The plan 
adopted by Thurstone will be described in Chapter X, to 
which a reader who wants rapid instruction in his methods 
of calculation should next turn. 

NOTE, 1945. With six tests the communalities which reduce to 
rank 3 are not necessarily unique, for there are, or there may be, 
two sets of them. See Wilson and Worcester, 1939. 

I think the ambiguity, which is not practically important, only 
occurs when n is exactly equal to the quantity at the foot of the 
opposite page, e.g. when r = 3, 6, 10, etc. 



CHAPTER III 

THE SAMPLING THEORY 

1. Two views. A hierarchical example as explained by one 
general factor. The advance of the science of factorial 
analysis of the mind to its present position has not taken 
place without opposition, and it is the purpose of the pre- 
sent chapter to give a preliminary description of some 
objections which have been frequently raised by the 
present writer (Thomson, 1916, 1919a, 19356, etc.) and 
which indeed he still holds to, although there has been of 
late years a considerable change of emphasis in the inter- 
pretations placed upon factors by the factorists themselves, 
which have tended to remove his objections. Briefly, the 
opposition between the two points of view would dis- 
appear if factors were admitted to be only statistical 
coefficients, possibly without any more " reality " than an 
average, or an index of the cost of living, or a standard 
deviation, or a correlation coefficient though, on the other 
hand, it may be admitted that some of them, Spearman's 
g for example, may come to have a very real existence in 
the sense of being both useful and influential in the lives 
of men. 

There seems to be room for some form of integration of a 
number of apparently antithetical ideas regarding the way 
in which the mind functions, and the sampling theory 
which the writer has put forward * seems in particular to 
show that what have been called " monarchic," " oli- 
garchic," and u anarchic " doctrines of the mind (Abilities, 
Chapters II-V) are very probably only different ways of 
describing the same phenomena. 

The contrast perhaps one should say the apparent 

* For a general statement see Brown and Thomson, 1921, Chapter 
X, and Thomson, 19356, and references there given. A somewhat 
similar point of view has in more recent years been taken in America 
by R. C. Tryon, 1932a and 6, and 1935. 

42 
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contrast between the factorial and the sampling points 
of view * can be best seen by considering the explanation 
of the same set of correlation coefficients by both views. 
As we have consistently done, so far, in this part of our 
book, we shall again suppose that there are no experi- 
mental or sampling errors we shall consider them 
abundantly in due course and to simplify the argument 
we shall take in the first place a set of correlation coefficients 
whose tetrads are exactly zero, which can therefore be 
completely " explained " by a general factor g and specifics, 
as in this table : 





1 


2 


3 


4 


1 




746 


646 


527 


2 


740 


. 


577 


471 


3 


646 


577 


. 


408 


4 


527 


471 


408 


. 



We can more exactly follow the argument if we employ 
the vulgar fractions of which these are the decimal 
equivalents, namely the following, each divided by 6 : 

I 1 2 3 4 

v/20 V 15 V 10 

V 12 V 

3 V 15 V^ 2 \/6 

4 V 10 A/8 V& 

In this form the tetrad-differences are all obviously zero 
by inspection. These correlations can therefore be ex- 
plained by one general factor, as in Figure 10, which gives 
them exactly. 

We have here a general factor of variance 30 which is the 
sole cause of the correlations, and specific factors of 
variances 6, 15, 30, and 60. The variances of the four 

* Two papers by S. C. Dodd (1928 and 1929) gave a very full and 
competent comparison of the two theories up to that date. The 
present writer agrees with a great deal, though not with all, of what 
Dodd says ; but see the later paper (Thomson, 19356) and also 
Chapter XX of this book. 



44 THE FACTORIAL ANALYSIS OF HUMAN ABILITY 



(45) 




Figure 10. 





Figure 12. 



(72) 




Figure 13. 



" tests " are 36, 45, 60, and 90. The " communalities " 
and " specificities " are : 



Test 


1 


2 


3 


4 


Totals 


Communality . 


30 
36 


30 
45 


30 
60 


30 
90 


-.,, 


Specificity 


6 
36 


15 
45 


30 
60 


60 
90 


S ' 


Totals 


1 


1 


1 


1 


4 



These communalities can be calculated from the corre- 
lation coefficients, for it will be remembered (Chapter I, 
Section 4) that when tetrad-differences are exactly zero, 
each correlation coefficient can be expressed as the 
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product of two correlation coefficients with g (two 
" saturations "). Thus 



7*23 == ^2g^3g 

Therefore 

!jL 2 -!2L = V ig^fy) v Vfe) _ r 2 

TK (VV) lg 

the square of the saturation of Test 1 with g. And when 
there is only one common factor, the square of its satura 
tion is the communality. 

The quantity r 12 r 13 /r23, therefore, means, on this theory 
of one common factor, the communality, or square of the 
saturation with g, of the first test. Its value in our 
example is 30/36, or five-sixths. 

2. The alternative explanation. The sampling theory. 
The alternative theory to explain the zero tetrad- 
differences is that each test calls upon a sample of the bonds 
which the mind can form, and that some of these bonds are 
common to two tests and cause their correlation. In the 
present instance we have arranged this artificial example 
so that the tests can be looked upon as samples of a very 
simple mind, which can form in all 108 bonds (or some 
multiple of 108).* The first test uses five-sixths of these 
(or 90), the second test four-sixths (or 72), the third three- 
sixths (54), and the fourth two-sixths (or 36). These 
fractions are the same in value as the communalities of 
the former theory. Each of them may be called the 
" richness " of the test. Thus Test 1 is most rich, and 
draws upon five-sixths of the whole mind. The fractions 
r ij r ikl r jk> which in the former theory were " communali- 
ties," are in the sampling theory " coefficients of rich- 
ness." They formerly indicated the fraction of each test's 
variance supplied by g; they indicate here the fraction 
which each test forms of the whole " mind " (but see later, 
concerning " sub-pools "). 

* There is nothing mysterious about the number 108. It is 
chosen merely because it leads to no fractions in the diagram. 
Any latge ntiraber would do. .. ...... ..... 



46 'THE FACTORIAL ANALYSIS OF HUMAN ABILITY 

Now, if our four tests use respectively 90, 72, 54, and 36 
of the available bonds of the mind, as indicated in Figure 
11, then there may be almost any kind of overlap between 
two of the tests. Any of the cells of the diagram may have 
contents, instead of all being empty except for g and the 
specifics. If we know nothing more about the tests except 
the fractions we have called their "richnesses," we cannot 
tell with certainty what the contents of each cell will be ; 
but we can calculate what the most probable contents will 
be. If the first test uses five-sixths and the second test 
four-sixths of the mind's bonds, it is most probable that 
there will be a number of bonds common to both tests 

5 4 

equal to - X -, or 20/36ths of the total number. That is, 

6 6 

the four cells marked a, b, c, d in the diagram, the cells 
common to Tests 1 and 2, will most likely contain 

20 

X 108 = 60 bonds 
36 

between them. By an extension of the same principle we 
can find the most probable number in each cell. Thus c, 
the number of bonds used in all four of the tests, is most 
probably 

5 X 4 X 8 X 2 X 108 = 10 bonds. 
6666 

In this way we reach the most probable pattern of 
overlap of the four tests shown in Figure 12. And this 
diagram gives exactly the same correlations as did Figure 10. 
Let us try, for example, the value of r 2S in each diagram. 
In Figure 10 we had 

r = - 30 - - V 12 - 577 
23 



X 60) 
In Figure 12 the same correlation is 

_ 20 + 10 + 4_+2 _ V12 _ 

23 ~^ " ~~ 



This form of overlap, therefore, will give zero tetrad - 
differences, just as the theory of one general factor did. 
More exactly, this sampling theory gives zero tetrad- 
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differences as the most probable (though not the certain) 
connexion to be found between correlation coefficients 
(Thomson, 1919a). 

If we let p l9 p 2 , p 3 , and p^ represent fractions which the 
four tests form of the whole pool of N bonds of the mind, 
then the number common to the first two tests will most 
probably be pip 2 N, and the correlation between the tests 



We therefore have, in any tetrad, quantities like the 
following : 

3 4 



1 

2 

and the tetrad-difference is, most probably (Thomson, 
1927a, 253) 



This may be expressed by saying that the laws of proba- 
bility alone will cause a tendency to zero tetrad-differences 
among correlation coefficients. In another form, which 
will be useful later, this statement can be worded thus : 
The laws of probability or chance cause any matrix of 
correlation coefficients to tend to have rank 1, or at 
least to tend to have a low rank (where by rank we mean 
the maximum order among those non- vanishing minors 
which avoid the principal diagonal elements). 

It is, in the opinion of the present writer, this fact a 
result of the laws of chance and not of any psychological 
laws which has made conceivable the analysis of mental 
abilities into a few common factors (if not into one only, 
as Spearman hoped) and specifics. Because of the laws 
of chance the mind works as if it were composed of these 
hypothetical factors g, v 9 n, etc., and a number of specific 
factors. The causes may be " anarchic," meaning that 
they are numerous and un connected , yet the result is 
" monarchic," or at least " oligarchic," in the sense that 
it may be so described provided always that large specific 
factors are allowed. 
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Of course, if the tetrad-differences actually found among 
correlation coefficients of mental tests were really exactly 
zero, or so near to zero that the discrepancies could be 
looked upon as " errors " due to our having tested a 
particular set of persons who did not accurately represent 
the whole population, then the theory of only one general 
factor would have to be accepted. For it gives exactly 
zero tetrad-differences, whereas the sampling theory only 
gives a tendency in that direction. But in actual fact it 
is only a tendency which is found, and matrices of correla- 
tion coefficients do not give zero tetrad-differences until 
they have been carefully purified by the removal of tests 
which " break the hierarchy." It has not proved very 
difficult to arrive at such purified teams of hierarchical 
tests. That is to be expected on the Sampling Theory, 
according to which hierarchical order is the most probable 
order. In the same way one would not have to go on 
throwing ten pennies for long before arriving at a set 
which gave five heads and five tails, for that is the most 
probable (yet not the certain) result. 

3. Specific factors maximized. The specific factors play, 
in the Spearman and Thurstone methods of factorization, 
an important r61e, and our present example can be used to 
illustrate the fact, which is not usually realized, that both 
these methods maximize the specifics (Thomson, 1938c) by 
their insistence on minimizing the number of general 
factors. In Figure 10, of the whole variance of 4, the 
specific factors contribute 1*667, or 41*7 per cent. In 
Figure 12, they contribute only 



10 + 4 + 2 + 1 = .2315, or 5-8 per cent. 

90 72 54 36 1,080 * 

Apart from certain trivial exceptions which do not occur 
in practice, it is generally true that minimizing the number 
of common factors maximizes the variance of the specifics. 
Numerous other analyses of the above correlations can be 
made (Thomson, 19350), but they all give a variance to 
the specifics which is less than 1 -667. Here, for example, 
in Figure 13 (pagte 44), is an analysis whic'h has no > 



THE SAMPLING THEORY 49 

factor but six other common factors, and which gives a 
total specific variance of 

I 5 + .1 + A + = ~ 3 ~- = -3056, or 7-6 per cent. 
90 72 54 1,080 r 

The same principle, that reducing the number of 
common factors tends to increase the variance of the 
specifics, can be seen illustrated in Figures 5 and 6 (Chap- 
ter I, page 15). Figure 6 has five common factors, and the 
proportion which the specific variance bears to the whole 
four tests is 

+ + + = 0-4, or 10 per cent. 
10 10 10 10 

In Figure 5 there are only two common factors, and the 
specific variance has risen to 

_ -j_ - + + = 1-4, or 35 per cent. 
10 10 10 10 

Again, in Figures 7, 8, and 9 (Chapter II, pages 26, 34, 
and 38) the same phenomenon can be observed. In 
Figure 7, with three common factors, the specific variances 
form 37 '5 per cent, of the four tests ; in Figures 8 and 9, 
with only two common factors, the specific variances form 
54 '6 per cent. 

Now, specific factors are undoubtedly a difficulty in any 
analysis, and to have the specific factors made as large and 
important as possible is a heavy price to pay for having as 
few common factors as possible. 

Spearman, it is true, in his earlier writings, and in 
Chapter IX of The Abilities of Man, boldly accepts the 
idea of specific factors ; that is, factors which play no part 
except in one activity only, or in very closely allied acti- 
vities. His analogy of " mental energy " (g) and " neural 
machines " (the specifics) always makes a considerable 
appeal to an audience. On that analogy the energy of the 
mind is applicable in any of our activities, as the electric 
energy which comes into a house is applicable in several 
different ways : in a lighting-bulb, a radio set, a cooking- 
stove, a heater, possibly an electric razor, etc. Some of 
the spe'cifie machines which us'e th efe'etfje et^e^gy ne'ed 
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more of it than do others, just as some mental activities 
are more highly saturated with g. If it fails, they all 
cease to work ; if it weakens, they all work badly. Yet 
when it is strong, they do not all work equally well : the 
electric carpet-sweeper may function badly while the 
electric heater functions well, because of a faulty connec- 
tion in the (specific) carpet-sweeping machine ; while Jones 
next door (enjoying the same general electric supply) 
possesses no electric carpet-sweeper. So two men may 
have the same g, but only one of them possess the specific 
neural machine which will enable him to perform a certain 
mental task. The analogy is attractive, and, it must be 
agreed, educationally and socially useful. There is no 
objection to accepting it so far. But with the complication 
of group factors it begins to break down. Most activities 
are found to require the simultaneous use of several 
" machines." There does not seem so sharp a distinction 
between the machines and the general energy. Moreover, 
the general energy, if there be such a thing, of our person- 
alities is commonly held to be of instinctive and emotional 
nature rather than intellective, while g, whatever else 
it is, is commonly thought of as closely connected with 
intelligence. 

That specific factors are a difficulty seems to be recog- 
nized by Thurstone. u The specific variance of a test/' he 
writes (Vectors, 63), " should be regarded as a challenge," 
and he looks forward to splitting a specific factor up into 
group factors by brigading the test in question with new 
companion tests in a new battery. It seems clear that 
the dissolution of specifics into common factors is unlikely 
to happen if each analysis is conducted on the principle of 
making the specific variances as large as possible. We 
must, however, leave this point here, to return to it in a 
later chapter of this book. 

4. Sub-pools of the mind. A difficulty which will occur 
to the reader in connexion with the sampling theory is that, 
when the correlation between two tests is large, it seems to 
imply that each needs nearly the whole mind to perform 
it (Spearman, 1928, 257). In our example the correlation 
bpjbween Te.sts 1 and 2 was -746, a correlation npj: infre- 
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quently reached between actual tests. It is, for instance, 
almost exactly the correlation reported by Alexander 
between the Stanford-Binet test and the Otis Self- 
administering test (Alexander, 1935, Table XVI). Does 
this, then, mean that each of these tests requires the 
activity of about four-sixths or five-sixths of all the 
" bonds " of the brain ? Not necessarily, even on the 
sampling theory. These two tests are not so very unlike 
one another, and may fairly be described as sampling the 
same region of the mind rather than the whole mind, so 
that they may well include a rather large proportion of the 
bonds found in that region. They may be drawn, that is, 
from a sub-pool of the mind's bonds rather than from the 
whole pool (Thomson, 19356, 91 ; Bartlett, 1937a, 102). 
Nor need the phrase " region of the mind " necessarily 
mean a topographical region, a part of the mind in the 
same sense as Yorkshire is part of England. It may mean 
something, by analogy, more like the lowlands of England, 
all the land easily accessible to everybody, lying below, 
say, the 300-foot contour line. What the " bonds " of the 
mind are, we do not know. But they are fairly certainly 
associated with the neurones or nerve cells of our brains, 
of which there are probably round about ten thousand 
million in each normal brain. Thinking is accompanied 
by the excitation of these neurones in patterns. The 
simplest patterns are instinctive, more complex ones 
acquired. Intelligence is possibly associated with the 
number and complexity of the patterns which the brain 
can (or could) make. A " region of the mind " in the 
above paragraph may be the domain of patterns below a 
certain complexity, as the lowlands of England are below 
a certain contour line. Intelligence tests do not call upon 
brain patterns of a high degree of complexity, for these 
are always associated with acquired material and with the 
educational environment, and intelligence tests wish to 
avoid testing acquirement. It is not difficult to imagine 
that the items of the Stanford-Binet test call into some 
sort of activity nearly all the neurones of the brain, though 
they need not thereby be calling upon all the patterns 
which those neurones can form. When a teacher is 
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demonstrating to an advanced class that " a quadratic 
form of rank 2 is identically equal to the product of 
two linear forms," he is using patterns of a complexity far 
greater than any used in answering the Binet-Simon items. 
But the neurones which form these patterns may not be 
more numerous. Those complicated patterns, however, 
are forbidden to the intelligence tester, for a very intelligent 
man may not have the ghost of an idea what a " quadratic 
form "is. Within the limits of the comparatively simple 
patterns of the brain which they evoke, it seems very 
possible that the two tests in question call upon a large 
proportion of these, and have a large number in common. 
The hope of the intelligence tester is that two brains which 
differ in their ability to form readily and clearly the 
comparatively simple patterns required by his test will 
differ in much the same way if, given the same educational 
and vocational environment, they are later called upon to 
form the much more complex patterns there found. 

As has been indicated, the author is of opinion that 
the way in which they magnify specific factors is the 
weak side of the theories of a single general factor or 
of a few common factors. That does not mean, however, 
that a description of a matrix of correlations in terms 
of these theories is inexact. Men undoubtedly do 
perform mental tasks as if they were doing so by 
means of a comparatively small number of group factors 
of wide extent, and an enormous number of specific 
factors of very narrow range but of great importance each 
within its range. Whether a description of their powers in 
terms of the few common factors only is a good description 
depends in large measure on what purpose we want the 
description to subserve. The practical purpose is usually 
to give vocational or educational advice to the man or to 
his employers or teachers, and a discussion of the relative 
virtues of different theories in this respect must wait until 
we have considered the somewhat technical matter of 
" estimation " in later chapters. We shall there see that 
factors, though they cannot improve and indeed may blur 
the accuracy of vocational estimates, may, however, 
faciKt&te the.m .whetfe dtherwi^e they would haVe bfesn 
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impossible, as money facilitates trade where barter is 
impossible. 

As a theoretical account of each man's mind, however, 
the theories which use the smallest number of common 
factors seem to have drawbacks. They can give an exact 
reproduction of the correlation coefficients. But, because 
of their large specific factors, they do not enable us to give 
an exact reproduction of each man's scores in the original 
tests, so that much information is being lost by their use. 
Reproduction of the original scores with complete exacti- 
tude can only be achieved by using as many factors as 
there are tests. But it can be done with considerable 
accuracy by a few of Hotelling's factors (called " principal 
components "), which will be described later. 

It will be seen from considerations such as these that 
alternative analyses of a matrix of correlations, even 
although they may each reproduce the correlation coeffi- 
cients exactly, may not be equally acceptable on other 
grounds. The sampling theory, and the single general 
factor theory, can both describe exactly a hierarchical set 
of correlation coefficients, and they both give an explana- 
tion of why approximately hierarchical sets are found in 
practice. In a mathematical sense, they are alternatives. 
But as Mackie has shown (Mackie, 19286), a psychologist 
who believes that the " bonds " of the sampling theory have 
any real existence, in the sense, say, of being represented 
in the physical world by chains and patterns of neurones, 
cannot without absurdity believe in the similarly real 
existence of specific factors. The analogue to Spearman's 
g, on the sampling theory, is simply the whole mind. 
" How, then," (as Mackie asks) " can we have other 
factors independent of such a factor as this ? " Only by 
the formal device of letting the specific factor include the 
annulling of the work done by the other part of the mind, 
a legitimate mathematical procedure but not one compatible 
with actual realities. Either, then, we must give up the 
factors of the two -factor theory, or the bonds of the 
sampling theory, as realities. We cannot keep both 
as realities, though we may employ either mathematically. 

5. The inequality of men. Professor Spearman has 
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opposed the sampling theory chiefly on the ground that 
it would make all correlations equal (and zero), and involve 
the further consequence that all men are equal in their 
average attainments (Abilities, 96), if the number of 
elementary bonds is large, as the sampling theory requires. 
Both these objections, however, arise from a misunder- 
standing of the sampling theory, in which a sample means 
" some but not all " of the elementary bonds (Thomson, 
1935&, 72, 76). As has been explained, tests can differ, 
on this theory, in their richness or complexity, and less 
rich tests will tend to have low, more complex tests will 
tend to have high correlations, at any rate if the " bonds " 
tend to be all-or-none in their nature, as the action of 
neurones is known to be. Neurones, like cartridges, either 
fire or they don't. And as for the assertion that the theory 
makes all men equal, there is no basis whatever for the 
suggestion that it assumes every man to have an equal 
chance of possessing every element or bond. On the con- 
trary, the sampling theory would consider men also to be 
samples, each man possessing some, but not all, both of the 
inherited and the acquired neural bonds which are the 
physical side of thought. Like the tests, some men are 
rich, others poor, in these bonds. Some are richly endowed 
by heredity, some by opportunity and education ; seme 
by both, some by neither. The idea that men are samples 
of all that might be, and that any task samples the powers 
which an individual man possesses, does not for a moment 
carry with it the consequences asserted of equal correlations 
and a humdrum mediocrity among human kind. 



CHAPTER IV 

THE GEOMETRICAL PICTURE* 

1. The fundamental idea. The student reading articles on 
factorial analysis is continually coming across geometrical 
and spatial expressions. For example, in Section 8 of 
our Chapter II we spoke of " rotating " the loadings of 
Thurstone's " centroid " method until they fulfil certain 
conditions. These geometrical expressions arise from the 
fact that the mathematics of mental testing is the same 
in its formal aspect as the mathematics of multi-dimensional 
space, and it is the object of the present chapter to explain 
this in elementary terms. Some degree of understanding 
of this is essential for the worker with tests, and it is not 
difficult when divested as far as possible of the algebraic 
symbols in which it is usually clothed. 

The fundamental idea is that the correlation between 
two tests can be pictorially represented by the angle 
between two lines which stand for the two tests, and which 
pass through a point, thus forming an X with its legs 
stretching ever so far in both ways. The point where the 
lines cross represents a man who 
has the average score on both 
tests. Other points on the lines 
represent standardized scores in 
the tests which are more or less 
removed from the average an 
arrowhead can be placed on each 
line to represent the positive 
direction, as in Figure 14. If 
the lines taken in the direction Figure 14. 

of these arrowheads make only 

a small angle with one another, they represent tests which 

are highly correlated. As the correlation decreases, this 

angle increases. When the correlation is zero, the angle 

* See Addendum, page 353. 
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is a right angle. If the angle becomes obtuse, the corre- 
lation is negative. 

Any point on the paper then represents a person by his 
two standardized scores in these two tests, obtained by 
dropping perpendiculars on to the two lines representing 
the tests. If we were to measure a large number of 
persons by each of these two tests say, ten thousand 
persons and place a dot on the paper for each person as 
represented by his two scores, we would naturally find that 
these dots would be crowded most closely together round 
the point where the test lines (or test vectors, as they are 
technically called) cross, where the average man is situated. 
The ten thousand dots would look, in fact, like shot marks 
on a target of which the bull's-eye was the average man at 
the cross-roads of the test vectors. The density of the 
dots would fall off equally to the north, south, east, and 
west of this point. Their " contours of density," as we 
say, would be circles. Circles, because any line through 
the imaginary man-who-is-average-in-everything repre- 
sents a conceivable test, and the standard deviation is 
everywhere represented by the same unit of length. The 
dots would look exactly like a crowd which, equally in all 
directions, was surrounding a focus of attraction at the 
crossing-point of the tests. 

2. Sectors of the crowd. On the diagram are shown also 
two dotted lines, perpendicular respectively to the two 
test vectors. Persons who are standing on one of these 
dotted lines have exactly the average score in the test to 
which it is perpendicular. Two of the sectors of the crowd 
are distinguished by shading in the diagram. Let us fix 
our attention on the northern shaded sector, which includes 
the two positive directions of the test vectors, marked by 
the arrowheads. Everybody in this sector of the crowd 
has a score above the average in both tests. Similarly, in 
the other shaded sector of the crowd, everybody has a 
score below the average in both tests. Both these sectors 
of the crowd contribute to the correlation between the tests, 
since everybody in these sectors does well in both, or badly 
in both. 

The people in the white sectors of the crowd, however, 
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have scores above the average in one test and below the 
average in the other. They diminish the correlation be- 
tween the tests. Those in the western white sector have 
scores above the average in Test X , but below the average 
in Test Y ; and vice versa for those in the eastern white 
sector. 

If the arrowheads X and Y are brought nearer together 
(while the people in the circular crowd remain standing still), 
so that the angle between the test vectors is diminished, 
the dotted lines will move so as to diminish the white 
sectors which lie between them, and the correlation will 
increase. When the test vectors are close together, one 
coinciding with the other, the white sectors will have dis- 
appeared and the correlation will be perfect. When the 
test vectors are at right angles, the white sectors will be 
quadrants, the crowd will be half " black " and half 
" white," and the correlation zero. Beyond the right-angle 
position, there will be more white than black, and a negative 
correlation . 

It is clear, then, that the angle between the test vectors 
inversely represents the correlation between the tests. It 
can be shown (but we shall take it on trust) that the cosine 
of the angle is equal to the correlation (Garnett, 1919a ; 
Wilson, 1928a). If we wish, therefore, to draw two vectors 
for two tests whose correlation we know, we consult a table 
of trigonometrical ratios, to find the angle whose cosine is 
equal to the correlation coefficient, and draw the lines 
accordingly. 

3. A third test added. The tripod. If we now wish to 
draw the vector of a third test, we must similarly consult 
the trigonometrical table to find from its correlation 
coefficients the angles it makes with the two former tests. 
We shall then usually discover that we cannot draw it on 
our paper, but that it has to stick out into a third dimension. 
It will only lie in the same plane as the other two if either 
the sum or the difference of its angles with them equals 
the angle between the first two tests. Usually this is not 
the case, and the vectors of these three tests will require 
three-dimensional space. They will look like a tripod 
extended upwards as well as downwards. If the correla- 
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tions are high, the tripod's legs will be close together ; if 
low, they will be far apart. This tripod analogy will make 
plausible to the reader the assertion that some sets of 
correlation coefficients cannot logically coexist. For the 
legs of a tripod cannot take up positions at any angles. 
If two of the angles are very small, the third one cannot 
be very large. The sum of any two of the angles must 
at least equal the third angle. And so on. For example, 
the following matrix of correlations is an impossibility : 

123 



1 
2 
3 



1-00 -34 -77 
34 1-00 -94 
77 -94 1-00 



Here Tests 1 and 2 are highly correlated with Test 3, 
so highly that they cannot possibly have only a correlation 
coefficient of *34 with each other. The angles corre- 
sponding to the above coefficients (taken as cosines) are : 





1 


2 


3 


1 





70 


40 


2 


70 





20 


3 | 40 


20 






and the fact that 40 + 20 is less than 70 shows that the 
matrix is impossible. 

When the symmetrical matrix of correlations is an 
impossible one, which could not really occur, it will be 
found that either the determinant itself, or one of the 
pivots in the calculation explained in Chapter II, Section 2, 
is negative. Let us carry out the calculation for the 
above matrix : 

(1-00) -34 -77 

34 1-00 94 

77 -94 1-00 



(8844) 
6782 



6782 
4071 



Determinant = -0999 
This test serves also for larger matrices. 
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Let us, however, return to our tripod of three vectors 
which by their angles with one another represent the corre- 
lations of three tests the legs of the tripod being the 
negative directions of the tests, let us assume, and their 
continuation upward past their common crossing-point 
the positive directions, though this is not essential. 

The point where the three vectors cross represents the 
average man, who obtains the average score (which we 
will agree to call zero) on each of the three tests. Any 
other point in space represents a man whose scores in the 
three tests are given by the feet of perpendiculars from 
this point on to the three test vectors. If, again, we sup- 
pose that ten thousand persons have undergone these 
three tests, the space round the test vectors will be filled 
with ten thousand points, which will be most closely 
crowded together near the average man at the crossing- 
point (or " origin ") of the vectors, and will form a spherical 
swarm falling off in density equally in all directions from 
that point. 

4. A fourth test added. One test was represented by a 
line. Two tests by two lines in a plane. Three tests by 
three lines in ordinary space. Suppose now we have a 
fourth test, look up its angles with the pre-existing three 
tests, and try to draw its line or vector, adding a fourth 
leg to the tripod. Just as the third test would not usually 
lie in the plane of the first two, but required a third dimen- 
sion to project out into, so the fourth test will not usually 
be capable of being represented in the three-space of the 
three tests. Its angles with them will not fit unless we 
add a fourth dimension. 

Here, of course, the geometrical picture, strictly speaking, 
breaks down. But it is usual and mathematically helpful 
to continue to speak as though spaces of higher dimensions 
really existed. In a " space " of four dimensions we can 
imagine four test vectors crossing at a point, their angles 
with one another depending upon the correlations. We 
can imagine a " spherical " swarm of dots representing 
persons. And when we add more tests, we can similarly 
imagine spaces of 5, 6 ... n dimensions to accommodate 
their test vectors. The reader should not allow the im- 
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possibility of visualizing these spaces of higher dimensions 
to trouble him overmuch. They are only useful forms of 
speech, useful because they enable us to refer concisely 
to operations in several variables which are exactly 
analogous to familiar operations in the real space in 
which we live such as " rotating " a line or a set of lines 
round a pivot. 

5. Two principal components. Let us now express the 
ideas we have used in the preceding three chapters in terms 
of this geometrical picture. Independent factors will be 
represented by vectors at right angles to one another (we 
shall for the most part be concerned only with independ- 
ent, i.e. uncorrelated factors, though at a later stage we 
shall have something to say about correlated or " oblique " 
factors). Analysing a set of tests into independent factors 
means, in terms of our geometrical picture, referring their 
test vectors to a set of rectangular vectors as axes of 
co-ordinates the Greek equivalent " orthogonal " is gen- 
erally used in this connexion instead of " rectangular.' 5 Let 
us explain this first of all in the simplest case, that of two 
tests, represented by their vectors in a plane, at the angle 
corresponding to their correlation. 

In this case, the most natural way of drawing orthogonal 
co-ordinates on the paper is to place one of them (see 

Figure 15) half-way between the 
test vectors, and the other, of 
course, at right angles to the first. 
These factor vectors correspond, 
in fact, to Hotelling's " principal 
" components," to which we shall 
return later. Of these two factors 
(or components) OA is as near 
as it can be to both test vectors 

. ' - it is the "first principal corn- 

Figure 15. r r 

ponent. 

We pictured, before, a swarm of ten thousand dots on 
the paper, each representing a person by his scores in the 
two tests, found by dropping perpendiculars from his dot 
to the two vectors. Instead of describing each point (each 
person, that is) by the two test scores, it is clear that we 
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could describe it by the two factor scores the feet of 
perpendiculars on to the factor vectors. It is also clear 
that, as far as this purpose goes, we might have taken 
our factor vectors or factor axes anywhere, and not 
necessarily in the positions OA and OB, provided they 
went through the point and were at right angles. In 
other words, we can " rotate " OA and OB round the 
point 0, and any position is equally good for describing 
the crowd of persons. Either of the tests, indeed, might 
be made one of the factors. The positions shown in 
Figure 15 are advantageous only if we want to use only 
one of our factors and discard the other, in which case 
obviously OA is the one to keep, as it lies as near as possible 
to both test vectors.* The scores along OA are the best 
possible single description of the two test results. That is 
the distinguishing virtue of Hotelling's " first principal 
component." 

6. Spearman axes for two tests. The orthogonal axes 
chosen by Spearman for his factors are, however, none of 
the positions to which OA and OB can be rotated in the 
plane of the paper. Besides, Spearman has three factors, 
and therefore three axes, for two tests, namely the general 
factor and the two specific factors, and we cannot have 
three orthogonal axes or factor vectors on a sheet of paper. 
The Spearman factors must, for two tests, lie in three- 
dimensional space, like the three lines which meet in the 
corner of a room. If we rotate the OA and OB of Figure 15 
out of the plane of the paper (say, pushing A below the 
surface of the paper, and, say, raising B above it), we shall 
clearly have to add a third axis, at right angles to OA and 
OB, to enable us to describe the tests and the persons who 
remain on the paper. There are now three axes to rotate ; 
and they must rotate rigidly, remaining at right angles to 
one another. The point at which Spearman stops the 
rotation, and decides that the lines then represent the 
" best " factors, is a position in which one of the axes is 

* Persons will, in fact, be placed in the same order of merit by 
their factors A as they are placed in by their average scores on the two 
tests, but this is not the case with the Hotelling first component of 
larger numbers of tests. 




Figure 16. 
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at right angles to Test X, and another is at right angles to 
Test Y. The third axis then represents g. 

7. Spearman axes for four tests. We are accustomed to 
depicting three dimensions on a flat sheet of paper, and 
so we can, in Figure 16, represent the Spearman axes g, s l9 

and s z for two tests. And since 
we have begun to depict other 
dimensions, by means of per- 
spective, on a flat sheet, let us 
continue the process and by a 
kind of super-perspective imagine 
that the lines s 3 , s l9 and any 
others we may care to add, re- 
present axes sticking out into a 
fourth, a fifth, and higher 
dimensions. Figure 16 thus re- 
presents the five Spearman axes 
for four tests, of which only the vector of the first test is 
shown (in its positive half only). 

All the five lines g, s i9 $ 2 , 3, and s 4 must be imagined as 
being each at right angles to all the others in five-dimen- 
sional space. The vector of Test 1, shown in the diagram, 
lies in the plane or wall edged by g and Si. It forms 
acute angles with g and with s l9 the cosines of which angles 
are its saturations with g and s t respectively. If it had 
been highly saturated with g 9 it would have leaned nearer 
to g and farther away from s^ 

The other three axes, s 29 $a> and $ 4 , are all at right angles 
to the wall or plane in which Test 1 lies. They have, 
therefore, no correlation with Test 1, no share in its 
composition. Test vector 2 similarly lies in the wall edged 
by g and s 29 test vector 3 in that edged by g and s 3 . The 
axis g forms a common edge to all these planes. If the 
battery of tests is hierarchical that is, if the tetrad- 
differences are all zero then all the tests of the battery 
can be depicted in this way, each in its own plane at right 
angles to all the other planes, no test vector being in the 
spaces between the " walls." 

The four test vectors themselves, of course, are only in 
a four-dimensional space (a 4-space we shall say, for 
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brevity). Just as, when we were discussing Figure 15, we 
said that Spearman used three axes which were all out of 
the plane of the paper, so here in Figure 16, with four test 
vectors (only one shown) in a 4-space, Spearman uses five 
axes in a space of one dimension higher than the number 
of tests. For n hierarchical tests, Spearman's factors are 
in an (n + l)-space. 

If along each test vector we measure the same distance 
as a unit, then perpendiculars from these points on to the 
g axis will give the saturations of the tests with g as fractions 
of this unit distance. The four dots on the g axis in Figure 
16 may thus be taken as representing the test vectors 
projected on to the " common-factor space," which is here 
a line, a space of one dimension only. Thurstone's system 
is like Spearman's except that the common-factor space is 
of more dimensions, as many as there are common factors. 
Figure 17 shows the Thurstone axes for four tests whose 
matrix of correlation coefficients can be reduced to rank 2. 

8. A common-factor space of two dimensions. Here there 
are two common factors, a and b, and four specifics, Si, 
s 2 , s 3 , and s 4 . All the six axes representing these factors 
in the figure are to be imagined as existing in a 6-space, 
each at right angles to all the others. The common -factor 
space is here two-dimensional, 
the plane or wall edged by a 
and b to make it stand out in 
the figure, a door and a window 
have been sketched upon it. 

In Spearman's Figure 16, 
each test vector lay in a plane 
defined by g and one of the 
specific axes. Here in Figure 
17, each test vector lies in 
a different 3-space. These v . ~ 

, . Figure 17. 

different 3-spaces have nothing 
in common with one another except the plane ab, the 
wall with the door and window in the diagram. In 
Figure 16 the projections of the test vectors on to the 
common-factor space were lines which all coincided in 
direction (though they were of different lengths), for 
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there the common-factor space was a line. Here the 
common-factor space is a plane, and the projections of the 
four test vectors on to that plane are shown in the figure 
by the lines on the " wall." These lines, if they are all pro- 
jections of vectors of unit length, wilj by their lengths on 
the wall represent the square roots of the communalities. 

9. The common-factor space in general. When there are 
r common factors, the common -factor space is of r dimen- 
sions, and the whole factor space (including the specifics) is 
of (n + r) dimensions. The test vectors themselves are in an 
n-space ; their projections on to the common-factor space 
are crowded into an r-space, and are naturally at smaller 
angles with one another than the actual test vectors are. 
These angles between the projected test vectors do not, 
therefore, represent by their cosines the correlations be- 
tween the tests. The angles are too small for that, and 
the cosines, therefore, too large. But if we multiply the 
cosine of such an angle by the lengths of the two projections 
which it lies between, we again arrive at the correlation. 

Thus in Figure 17, the angle between the lines 1 and 3 
on the wall is less than the angle between the actual test 
vectors 1 and 3 out in the 6-space, of which the lines on 
the wall are the projections. But the lengths of the lines 1 
and 3 on the wall are less than the unit length we marked 
off on the actual vectors, being in fact the roots of the com- 
munalities. If we call these lengths on the wall h^ and & 3 , 
then the product hji^ times the cosine of the projected 
angle again gives the correlation coefficient. 

10. Rotations. It will be remembered that Thurstonc, 
after obtaining a set of loadings for the common factors 
by his method of analysis of the matrix of correlations, 
" rotates " the axes until the loadings are all positive 
and he also likes to make as many of them as possible zero. 
It is instructive to look at this procedure in the light of our 
geometrical picture from which the phrase " rotating the 
factors " is taken. It should be emphasized first of all 
that such rotation of the common-factor axes in Thur- 
stone's system must take place entirely within the com- 
mon-factor space, and the common-factor axes must not 
leave that space and encroach upon the specifics. In 
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Figure 16, therefore, no rotation, in Thurstone's sense, of 
the g axis can be made (since the common -factor space is a 
line), except indeed reversing its direction and measuring 
stupidity instead of intelligence. 

In Figure 17 the common -factor space is a plane, and 
the axes a and b can be rotated in this plane, like the hands 
of a clock fixed permanently at right angles to one another. 
When the positive directions of a and b enclose all the 
vector projections, as they do in our figure, then all the 
loadings are positive. The position shown would, there- 
fore, fulfil this desire of Thurstone 's. Moreover, one of 
the loadings could be made zero, by rotating a and b until 
a coincides with line 1 (when b will have no loading in 
Test 1), or until b coincides with line 4 (when a will have 
no loading in Test 4). 

When there are three common factors, the common- 
factor space is an ordinary 3-space. The three common- 
factor axes divide this space into eight octants. Rotating 
them until all the loadings are positive means until all the 
projections of the test vectors are within the positive 
octant. This will always be nearly possible if the corre- 
lations are all positive. Moreover, it is clear that we can 
always make at any rate some loadings zero. In the 
common-factor 3-space we can move one of the axes until 
it is at right angles to two of the test projections, in which 
tests that factor will then have no loading. Keeping that 
axis fixed, we can then rotate the other two axes round it, 
seeking for a position where one of them is at right angles 
to some test. The number of zero loadings obtainable 
will clearly be limited unless the configuration of the test 
vectors happens to lend itself to many zeros. We shall see 
later that Thurstone seeks for teams of tests which do this. 

Although Thurstone makes his rotations exclusively 
within the common-factor space, keeping the specifics 
sacrosanct at their maximum variance, there is, of course, 
nothing to prevent anyone who does not hold his views 
from rotating the common-factor axes into a wider space, 
and increasing the number of common-factor axes at the 
expense of the specific variance, until ultimately we reach as 
many common factors as we have tests, and no specifics. 

F.A. 3 



CHAPTER V 

HOTELLING'S "PRINCIPAL COMPONENTS" 

1. Another geometrical picture. The geometrical picture 
of the last chapter, however, is not the only form of spatial 
analogy which can be used for representing the results of 
mental tests, nor indeed was it the first in the field, though 
it is the most powerful. The earlier, and perhaps more 
natural, plan of representing two tests was by two lines 
at right angles, instead of at an angle depending on their 
correlation as in Chapter IV. Using the two lines at right 
angles, and the two test scores as co-ordinates, each person 
could, in this form of diagram also, be represented by a 
point on the paper, and his two scores by the feet of 
perpendiculars from that point on to the test axes. But 
if, on such a diagram, we mark the points of ten thousand 
persons, these will, of course, not be distributed in the same 
circular symmetrical fashion as in Figure 14 (page 55). If 
we look at Figure 14, we can see what would happen 
to the crowd of persons if we were to pull the test vectors 
farther and farther apart * until finally they were at right 
angles. The shaded northern sector of the crowd is com- 
posed of persons whose scores are above average in both 
tests, and this sector is bounded by the two dotted lines 
which are at right angles to the test vectors. As the angle 
between the test vectors grows larger, the two dotted lines 
in question close towards one another, and this shaded 
section of the crowd is driven northward. Simultaneously 
the other shaded section is driven southward. When the 
test vectors reach a position at right angles to one another, 
the dotted line at right angles to X falls along F, and the 
other along X 9 and we have Figure 18. The crowd is no 
longer distributed in a circular fashion round the origin. 

* It is understood that they continue to stand for the same tests, 
with the same correlation, though the latter is no longer represented 
by the cosine of the angle between the vectors. 

66 
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It now bulges out to the north and south, in the quadrants 
where the two test scores are either both positive or both 
negative, and its lines of equal density, formerly circles, 
have become ellipses. In this form of diagram, it is this 
ellipticity of the crowd which shows the presence of correla- 
tion between the tests. If the 
tests are highly correlated, the 
ellipses will be long and narrow ; 
if they are less correlated, they 
will be plumper; if there is no 
correlation, they will be circles ; if 
there is negative correlation, they 
will be longer the other way, i.e. 
from east to west in our diagram. 
In our former figures in Chapter Figure 18. 

IV, the space of the diagram, 

whether plane, solid, or multi-dimensional, was peopled 
by a " spherical " crowd whose density fell away equally 
in all directions from the origin, while correlation between 
tests was indicated by the angles between their test 
vectors. In the present chapter, all the test vectors are at 
right angles, and the space is peopled by a crowd whose 
density falls off differently in different directions unless 
there is no correlation present. 

If we add a third test to the two in Figure 18, its axis, 
in the present system, has to be at right angles to the first 
two. The former spherical swarm of persons (of Chapter 
IV) has become now an ellipsoidal swarm, like a Zeppelin, 
with proportions determined by the correlations. If these 
are positive, its greatest length will be in the direction 
of the positive octant of space (that octant in which all 
scores are above average, i.e. positive), and the opposite 
negative octant. Its waist-line will not, as a rule, be 
circular, but elliptical. 

The ellipse of Figure 18 has two principal axes, a major 
axis from north to south, and a minor axis at right angles 
to it from east to west. The ellipsoid of three tests has 
three principal axes ; the " ellipsoid " (for we continue to 
use the term) for n tests will be in n-dimensional space and 
will have n principal axes. It is these principal axes of 
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the ellipsoids of equal density which are the " principal 
components " of Hotelling's method (Hotelling, 1933). 
They are exactly equal in number to the tests, but usually 
the smaller ones are so small as to be negligible, within the 
limits of exactitude reached by psychological experiment. 

2. The principal axes. Finding Hotelling's principal 
components, therefore, consists in finding those axes, all 
at right angles to one another, which lie one along each 
principal axis of the ellipsoids of equal density of the 
population of persons tested. In Figure 18, for example, 
one of them lies north and south, the other east and west. 
The crowd of persons can then be described in terms of 
these new axes, in terms of factors, that is, instead of in 
terms of the original tests. These factors are uncorrelated, 
for the crowd is symmetrically distributed with regard to 
them, though not in a circular manner. This brings us to 
one more thing that has to be done to these factors before 
they become Hotelling's principal components : they have 
to be measured in new units. The original test scores were, 
we have tacitly assumed in making our diagrams, measured 
in comparable units, namely each in units of its own 
standard deviation. But the factors arrived at by a mere 
rotation to the principal axes, in an elliptically distributed 
crowd, are no longer such that the standard deviation of 
each is represented by the same distance in the diagram. 
If in Figure 18 all the points representing people are pro- 
jected on to a horizontal east-and-west factor (Factor II), 
the feet of these perpendiculars are obviously more crowded 
together than the corresponding points would be on a 
north-and-south factor (Factor I). On this diagram, 
therefore, the standard deviation of Factor II is represented 
by a shorter distance than is the standard deviation of 
Factor I. To make these equal, we would have to stretch 
our paper from east to west, or compress it from north to 
south, until the crowd was again circular, during which 
procedure the test vectors would have to move back to 
the position of Figure 14 to keep the crowd's test scores 
equal to their projections, and we are then back at the 
space of Chapter IV. The " ellipsoidal " space of this 
present chapter, in fact, is used only until the principal 
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axes of the ellipsoid are discovered, after which, by a 
change of units along each principal axis, it is made into 
a " spherical " space again. 

In the preceding paragraph, the reader may feel a 
difficulty which has been known to trouble students in 
class. If, he may say, we stretch Figure 18 from east to 
west till the ellipse is a circle, that ought to separate the 
arrows of the test vectors still farther. Yet you say they 
will return to the positions shown in Figure 14 ! 

The mistake lies in thinking that stretching the space 
the plane of the paper in Figure 18 till the ellipsoid is 
spherical will move the test vectors with the space. The 
points representing persons move with the space ; indeed, 
they are the space. But the test vectors are not rigidly at- 
tached to the space. Each test vector must be such that 
every person's point, projected on to it, gives his score. 
If the points move about, as they do when we stretch the 
paper, the test vector must move so that this remains true, 
and in our case that means moving nearer together as the 
crowd becomes more circular. It is just the reverse of the 
process by which we obtained Figure 18 from Figure 14. 

3. Advantages and disadvantages. The advantage of 
Hotelling's factors can be best appreciated while the crowd 
of persons is in the ellipsoidal condition. Hotelling's first 
factor (or " component," as he calls it) runs along the 
greatest length of the crowd, and gives the best single 
description of a person's position. If we know all his 
factor scores, we know exactly where he is in the crowd. 
If we have to search for him, we would rather be told his 
position on the long axis, and search along the short ones, 
than be told his position on any other axis instead. If 
there are, say, twenty tests, there will be twenty principal 
axes ranging from longest to shortest, and twenty Hotelling 
components.* But the first four or five of these will go 
a long way towards defining a man's position in the tests, 

* All that is here said about principal components refers to the 
case, which is that considered by Hotelling, in which the method 
of calculation about to be described is applied to the matrix of 
correlations with unities (or possibly with reliabilities) hi the diagonal 
cells. The method, as a means of calculation, however, could be 
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and will do so better than any other equally numerous set 
of factors, whether of Hotelling's or of any other system. 
In this respect Hotelling's factors undoubtedly stand 
foremost. They will not, however, reproduce the correla- 
tions exactly unless they are all used, whereas in Thurstone's 
system a few common factors can, theoretically, do this, 
though in actual practice the difference of the two systems 
in this respect is not great. The chief disadvantage of 
Hotelling's components is that they change when a new 
test is added to the battery. When a new test is added 
to a Spearman battery, provided that it conforms to the 
hierarchy, g does not change in nature, though its exactness 
of measurement is changed. Whether Thurstone's com- 
mon factors will remain invariant in augmented batteries, 
and whether they will also do so when differently selected 
samples of people are tested, are questions we shall con- 
sider at a later stage in this book. 

4. A calculation. The actual calculation of the loadings 
of Hotelling's components requires, for its complete under- 
standing, a grasp of the method of finding algebraically the 
principal axes of an ellipsoid, a problem which will be 
found dealt with in three dimensions in any textbook on 
solid geometry. We give an account of this, for n dimen- 
sions, in the Appendix. Here we shall only explain 
Hotelling's ingenious iterative method of doing this 
arithmetically, by means of an example, for which we shall 
use the matrix of correlations already employed in Chapter 
II to illustrate Thurstone's method (see opposite page). 

We have inserted unities in the diagonal cells, for 
Hotelling's procedure does not contemplate the assumption 
of specific factors (much less maximized specifics) except 
possibly that part of a specific which is due to error, in 
which case what are called " reliabilities " (actual correla- 
tions of two administrations of the test) would be used in 
the diagonal. 

Hotelling's arithmetical process then begins with a guess 

used to obtain loadings for the common factors after Thurstone's 
communalities have been inserted, instead of the "centroid" method. 
The advantage over the centroid method would be that absolutely 
the maximum variance would be " taken out " by successive factors. 
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at the proportionate loadings of the first principal com- 
ponent. Practically any guess will do a bad guess will 
only make the arithmetic longer. We have guessed *8, 1, 
1, ?, the numbers to be seen on the right of the matrix, 
because these numbers are roughly proportional to the 
sums of the four columns, and such numbers usually give 
a good first guess. 

Each row of the matrix is then multiplied by the guessed 
number on its right, giving the matrix below the first one, 
beginning with -80. We then take, as our second guess, 
numbers proportional to the sums of the columns of this 
matrix, namely 

1-74 2-23 2-23 1-46 
giving -78 1 1 -65 

That is, we divide the sums of the columns by their largest 
member, and use the results as new multipliers. They 
are seen placed farther on the right of the original matrix. 
It is unusual for two of them to be of the same size that 
is a peculiarity of our example. 

It is always the original matrix whose rows are multiplied 
by each improved set of multipliers. The above set gives 
the next matrix shown, that beginning with '780, and the 
sums of its columns 
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1-710 2-207 2-207 1-406 
give a third guess at the multipliers, namely 
775 1 1 -637 

And so the reiteration goes on, and the reader, who is 
advised to carry it a stage farther at least, would find if he 
persevered that the multipliers would change less and less. 
If he went on long enough, he would reach this point 
(usually, however, far fewer decimals are sufficient) : 
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1-698827 2-198089 2-198089 1-384384 
giving -772865 1 1 -629813 



that is, totals in exactly the same proportion as the multi- 
pliers. These final multipliers (or earlier ones if the experi- 
menter is content with less exact values) are then propor- 
tionate to the loadings of the first Hotelling component in 
the four tests. They have, however, to be reduced until 
the sum of their squares equals the largest total, 2-198089, 
which is called the first " latent root " of the original 
matrix. This is done by dividing them by the square root 
of the sum of their squares mid multiplying them by the 
square root of the latent root. They then become 
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857 
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540. 



The next step in Hotelling's process is similar to one 
with which we have already become familiar in Thur- 
stone's method. The parts of the variances and correla- 
tions due to this first component are calculated and sub- 
tracted from the original experimental matrix. These 
variances and correlations due to the first component are : 
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The residual matrix is then treated in exactly the same 
way as the original matrix, the beginnings of the process 
being shown above. The guessed multipliers, proportional 
to the sums of the columns, are not so near the truth this 
time, for the first one, which we have guessed at #, and 
which reduces after one operation to 18, goes on reducing 
until it becomes negative, the final values of these second 
loadings being as shown in the appropriate column of the 
following table, which also gives the loadings of the third 
and fourth factors, obtained in the same way. The vari- 
ances and correlations due to each factor in turn are 
subtracted from the preceding residual matrix and the new 
residual matrix analysed for the next factor : 
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* These four quantities are, in the Hotelling process, what are 
called the '* latent roots " of the matrix. 
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An alternative method of finding principal components, 
due to Kelley, is to deal with the variables two at a time. 
The pair first chosen are rotated in their plane until they 
are uncorrelated. Then the same is done to another pair, 
and so on, the new uncorrelated variables being in turn 
paired with others, until finally all correlations are zero. 
(Kelley, 1935, Chapters I and VI.) A chief advantage is 
that the components are obtained pari passu 9 and not 
successively ; also, in certain circumstances where Hotel- 
ling's process converges very slowly, Kelley's is quicker. 
The end results are the same. 

5. Acceleration by powering the matrix. In a later paper 
Hotelling pointed out that his process of finding the load- 
ings of the principal components can be much expedited 
by analysing, not the matrix of correlations itself, but its 
square, or fourth, eighth, or sixteenth power, got by 
repeated squaring (Hotelling, 19356). Squaring a sym- 
metrical matrix is a special case of matrix multiplication 
(see Chapter VII, Section 8) : it is done by finding the 
" inner products " (see footnote, page 31) of each pair of 
rows, including each row with itself, and setting the 
results down in order. Applying this to the correlation 
matrix : 
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we see that the inner product of the first row with itself 
is 1*36 ; of the first row with the second, 1-14 ; and so on. 
Setting these down in order, we get for the matrix squared : 
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Exactly the same process is applied to this, beginning 
with guessed multipliers, as we applied to the original 
matrix. The multipliers, however, settle down twice as 
rapidly towards their final values, which are the same here 
as there. We have finally : 
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The " latent root," however, or largest total, 4-831598, is 
the square of the former latent root, 2-198090, so that its 
square root must be taken before we complete finding the 
loadings. 

In exactly the same way the squared matrix may be 
again squared, and again and again, before we analyse it. 
The more we square it, the quicker the Hotelling iteration 
process works. The end multipliers are always the same, 
but the " root " is the same power of the root we need as 
is the matrix of the original matrix. 

A still further acceleration of the process is due to Cyril 
Burt, who observed that as the matrix is repeatedly 
squared it becomes more and more nearly hierarchical, 
including the diagonal cells (Burt, 1937a). This is due 
to the largest factor increasingly predominating as it is 
" powered," especially if the largest latent root is widely 
separated from the others. In consequence, the square 
roots of the diagonal cells become more and more nearly 
in the ratio of the Hotelling multipliers, and form an 
excellent first guess for the latter. When our matrix 
is squared twice again, giving the eighth power, it 
becomes : 
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108-78 140-67 140-67 88-54 

140-67 182-03 182-03 114-61 

140-67 182-03 182-03 114-61 

88-54 114-61 114-61 72-38 



and the square roots of its diagonal members are 
10-429 13-492 13-492 8-508 

which are in the ratio 

7730 1 1 -6306 

very near indeed to the Hotelling final multipliers 

772865 1 1 -629811 

Hotelling gives a method of finding the residues, for the 
purpose of calculating the next factor loadings, from the 
" powered " matrix. But it may be so nearly perfectly 
hierarchical that this fails unless an enormous number of 
decimals have been retained, and it is in practice best to 
go back to the original matrix and obtain the residues 
from it. Their matrix can in turn be squared, and so on. 
Other and very powerful methods of acceleration will 
be found described in Aitken, 1937ft. 

6. Properties of the loadings. If all the Hotelling com- 
ponents are calculated accurately, their loadings ought 
completely to exhaust the variance of each test ; that is, 
the sum of the squares of the loadings in each row should 
be unity. The sum of the squares of the loadings in each 
column equals the " latent root " corresponding to that 
column, and the sum of the four latent roots is exactly 
equal to the number of tests. Each latent root represents 
the part of the whole variance of all the tests which has 
been " taken out " by that factor. Thus the first factor 
" takes out " 55 per cent., the first two factors together 
75-6 per cent., of the variance of the original scores. The 
four factors account for all the variance. 

If we turn back to Chapter II, where we made a Thurstone 
analysis of this same battery of four tests into two common 
factors and four specifics (six factors in all), we see, in the 
table on page 35, that the two first Thurstone factors 
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"take out " 1-7652 and -0515 respectively that is, 44-1 
per cent, and 1-3 per cent, of the four tests much less 
than the two first Hotelling factors account for. Because 
of this, the two first Hotelling factors will reproduce the 
original scores much better than the two Thurstone factors 
will. On the other hand, the two Thurstone factors 
reproduce the correlations exactly, while it takes all four 
Hotelling factors to do this. 

The correlations which correspond to the loadings given 
in the table on page 73 are obtained by finding the 
" inner product " of each pair of rows. Applying this to 
the table we find the correlation r 24 , say, to be 

856836 X -539645 -135197 X -826092 -312332 

X -162323 -387298 X zero = -300000 

In this way, as we said above, the loadings of the four 
Hotelling factors will exactly reproduce the correlations 
we began with. If, however, we have stopped the analysis 
after we have found only two principal components (or 
factors), these two would have reproduced the correlations 
only approximately. For example, for r 24 we should only 
have 

-856836 X -539645 -135197 X -826092 

= -350702 instead of -300000 

Before we leave the table of Hotelling loadings, we may 
note that the signs of any column of the loadings can be 
reversed without changing either the variances or the 
correlations. Reversing the signs in a column merely 
means that we measure that factor from the opposite end, 
as we might rank people either for intelligence or stupidity 
and get the same order, but reversed. We will usually 
desire to call that direction of a factor positive which most 
conforms with the positive direction of the tests them- 
selves, and therefore we will usually make the largest 
loading in each column positive. 

All the loadings of Hotelling's first factor are, in an 
ordinary set of tests, positive. Of the other loadings, 
about half are negative. Thurstone's first analysis, it will 
ite renitembeited, also -gave a number t>f negative loadings 
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to the factors after the first, but he rotated his factors until 
these disappeared. 

7. Calculation of a man's principal components. Esti- 
mation unnecessary. The Hotelling components have one 
other advantage over other kinds of factors that we did 
not mention in Section 3. They can be calculated exactly 
from a man's scores (provided that unities and not 
communalities are used in the diagonal, as was the case in 
Hotelling' s exposition), whereas Spearman or Thurstone 
factors can only be estimated. This is because the Hotel- 
ling components are never more numerous than the tests, 
whereas the Thurstone or Spearman factors, including the 
specifics, are always more numerous than the tests. For 
the Hotelling components, therefore, we always have just 
the same number of equations as unknowns, whereas we 
have more unknowns than equations in the Spearman- 
Thurstone system. 

We have hitherto given the analysis of tests into factors 
in the form of tables of loadings, or matrices of loadings, 
as we may call them, adopting the mathematical term. 
But we can alternatively write them out as " specification 
equations," as we shall call them. Thus the table on 
page 73 would be written 

% = -6622187! ~ '323324y 2 + -675967y 3 

ZB = -8568367! - -135197y 2 - -312332y 3 - -387298y 4 

2 3 = -8568367! ~ '135197y 2 - -312332y 3 -f -387298y 4 

92y2 -f '162323y 3 



Here z ly z Z9 s 8 , and * 4 stand for the scores in the four 
tests, measured in standard units ; that is, measured from 
the mean in units of standard deviation. The factors 
Yi Y2> YSJ an d y 4 are also supposed to be measured in such 
units. These specification equations enable us to calculate 
any man's standard score in each test if we know his 
factors, and since there are just as many equations as 
factors, they can be solved for the y's and enable us to 
calculate, conversely, any man's factors if we know his 
scores in the tests. The solution to these Hotelling equa- 
tions for the y's happens to be peculiarly simple, as we 
shall pfoVe in the Appendix* Section 7. It is as follows* 
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yi = ( '662218% + -856836*2 + -8568362 3 + -539645z 4 ) - 2-198090 
n = (_ .323324% - -1351972; 2 - -13519723 + -826092* 4 ) - -823526 
y 3 = ( .675967*! - -312332z 2 - -3123322 3 + -162323z 4 ) - -678383 
y 4 = ( - -387298*2 + -38729823 ) - -300000 

The table on page 73, therefore, serves a double purpose. 
Read horizontally it gives the composition of each test in 
terms of factors. Read vertically it gives the composition 
of each factor in terms of tests, if we divide the result by 
the root at the foot of the column.* 

Suppose, for example, that a man or child has the fol- 
lowing scores in the four tests 

1-29 -36 -72 1-03. 

This is evidently a person above the average in each test, 
since the scores are all positive. His factors will be 
obtained by substituting these scores for the 2 5 s in the 
above equations, with the result 

Yi = 1-062504 
y 2 = -349441 
y 3 = 1 -034624 
Y* = -464757 

(Of course, in practical work six decimal places would be 
absurd. They are given here because we are using this 
artificial example to illustrate theoretical points, in place 
of doing algebraic transformations, and they need, there- 
fore, to be exact.) 

If these values for the factors are now inserted in the 
specification equations opposite, the scores z in the test 
will be reproduced exactly (1-29, -36, -72, and 1-03). 

Notice, too, that if we have stopped our analysis at less 
than the full number of Hotelling factors, we can never- 
theless calculate these factors for any person exactly. As 
soon as we have the first column of the table on page 73, 
we can calculate YI for anyone whose scores z we know. 

Had we done this with the person whose scores are given 

* If the analysis has been performed with " reliabilities " in the 
diagonal cells instead of units, the statement in the text still holds 
(Hotelling, 1933, 498). If on correlations corrected for " attenua- 
tion," the matter is m'ore complicated (ibid. 4&'9-02). 
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above, we should have summarized his ability in these four 
tests by the one statement 

Yi = 1-062504 

This would have been an incomplete statement, but it is 
the best single statement that can be arrived at. If we 
attempt to reproduce the scores z from this one factor alone, 
we can use only the first term in each of the specifica- 
tion equations on page 78. These give for the scores 

704 -910 -910 -573 
instead of 1-29 -36 -72 1-03, the true values, 

a pretty bad shot, as the reader will agree. But bad as it 
is, it is better than any other one factor will provide, as 
we shall show later after we have considered how to 
estimate Spearman and Thurstone factors. 

It will be seen from these first chapters that the different 
systems of factors proposed by different schools of " fac- 
torists " have each their own advantages and disadvan- 
tages, and it is really impossible to decide between them 
without first deciding why we want to make factorial 
analyses at all. This fundamental question we will devote 
some pages to in later chapters. But there are still several 
things we must do in preparation, and we turn next to a 
matter which has wider applications than in factorial 
analysis, namely the method of estimating one quality 
from measurements of other qualities with which it is 
correlated. This, for example, is the problem before those 
who give vocational advice to a man after putting him 
through various t6sts, or who give educational advice (or 
more peremptory instructions) to English children of 
eleven years of age after examining them in English, 
arithmetic, and perhaps with an " intelligence test," sorting 
them into those who may attend a secondary school, those 
who go to a central school, and those who remain in an 
elementary school. 



PART II 
THE ESTIMATION OF FACTORS 

To simplify and clarify the exposition, errors due to 
sampling the population of persons are in Parts I and II 
assumed to be non-existent. 



CHAPTER VI 

ESTIMATION AND THE POOLING SQUARE 

1. Correlation coefficient as estimation coefficient. A corre- 
lation coefficient indicates the degree of resemblance 
between two lists of marks : and therefore it also indicates 
the confidence with which we can estimate a man's position 
in one such list x if we know his position in the other y. 
If the correlation between two lists is perfect (r = 1), 
we know that his standardized score * in the one list is 
exactly the same as in the other (x = y). 

If the correlation between the two lists is zero (r = 0), 
then the knowledge of a man's position in the one list tells 
us nothing whatever about his position in the other list. 
If we are compelled to make an estimate of that, we can 
only fall back on our knowledge that most men are near 
the average and few men are very good or very bad in any 
quality. We have, therefore, most chance of being correct 
if we guess that this man is average in the unknown test. 
(x = 0. The average mark we have agreed to call zero ; 
marks above average, positive ; marks below average, 
negative.) 

In the first case, when r = 1, we are justified in equating 
his unknown score x to his known score y 

x = y 

In the second case, when r = 0, we are compelled by our 
ignorance to take refuge in 

x = or average. 

Both these statements can be summed up in the one 
statement 

x = ry 

where the circumflex mark over the x is meant to indicate 

* A test score always means a standardized score unless the 
contrary is stated. But estimates are not in standard measure in 
general. 

83 
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that this is an estimated, not a measured, value. If, now, 
we consider a case between these, where the correlation is 
neither perfect nor zero, it can be shown that this equation 
still holds, provided each score is measured in standard 
deviation units. Since r is always a fraction, this means 
that we always estimate his unknown x score as being 
nearer the average than his known y score. That is 
because we know that men tend to be average men. If 
this man's y score is high, say 

= 2 

(two standard deviations above the average), and if the 
correlation between the qualities x and y is known to be 
r = -5, we guess his position in the x test as being 

fc^ry = >5x2 = I 

i.e. only one standard deviation above the average. This 
is a guess influenced by our two pieces of knowledge, 
(1) that he did very well in Test j/, which is correlated with 
Test a?, and (2) that most men get round about an average 
score (zero). It is a compromise, an estimate. It will 
often be wrong ; indeed, very seldom will it be exactly 
right. But it will be right on the average, it will as often 
be an underestimate as an overestimate, in each array 
of men who are alike in y. The correlation coefficient, 
then, is an estimation coefficient for tests measured in 
standard deviation units. 

2. Three tests. Suppose now that we have three tests 
whose intercorrelations are known, and that a man's scores 
on two of them, y and z, are known. We wish to estimate 
what his score will most probably be in the other test, x. 
x need not be a test in the ordinary sense of the word, but 
may be an occupation for which the man is a candidate 
or entrant. According as we use his known y or his 
known z score, we shall have two estimates for his x score. 
To fix our ideas, let us take definite values for the correla- 
tions, say : 
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The two estimates for his x are then 

< = -7y 

x = -5z 

and of these we shall have rather more confidence in the 
estimate associated with the higher correlation. But we 
ought to have still more confidence in an estimate derived 
from both y and z. Such an estimate could use not only 
the knowledge that y and z are correlated with x, but also 
the knowledge that they are correlated to an extent of 
r = -3 with each other. Just to take the average of the 
above two separate estimates will not utilize this knowledge, 
nor will it utilize the fact that the estimate from y (r = *7) 
is more worthy of confidence than the estimate from 
z (r = -5). 

What we want is to know how to combine the two scores 
y and z into a weighted total 

(by + cz) 

which will have the highest possible correlation with x. 
Such a correlation of a best-weighted total with another 
test is called a multiple correlation. From such a weighted 
total of his two known scores we could then estimate the 
man's x score more accurately than from either the y or 
the z score alone. It must use all the information we have, 
including our information that y and z correlate to an 
amount r = '3. 

3. The straight sum and the pooling square. In order to 
answer this question, we shall first consider the problem 
of finding the correlation of the straight unweighted sum 
of the scores y + z with x. This is the simplest form of a 
problem to which a general answer was given by Professor 
Spearman (Spearman, 1913). 

We shall put his formula into a very simple form, which 
we may call a pooling square. In our present instance we 
want to find the correlation of y + z with x (all of these 
being, we are assuming, measured in standard deviation 
units). We divide the matrix of correlations by lines 
separating the " criterion " x from the " battery " y + z 
thus : 



1-0 


7 


5 


7 


1-0 


3 


5 


3 


1-0 
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x y z 

x 



y 

z 



In each of the quadrants of this pooling square (with 
unities in the diagonal, be it noted) we are going to form 
the sum of all the numbers, and we shall indicate these 
sums by the letters : 

A C 

C B 

(where C is the sum of the Cross-correlations between the 
battery y + z and the criterion x, which can be regarded 
as a second battery of one test only). 

Then the correlation of x with y + z is equal to 

C 



= -744 



which in our present example is 



= - ~= 
V(l) X (1 + -3 + -3 + 1) V 2 ' 6 

so that the battery (y + z) has a rather better correlation 
(744) with x than has either of its members (-7 and -5). 
From the straight sum of the man's scores in the two tests 
y and z we can therefore in this case get a better estimate 
of his score in x than we could get from either alone. 

4. The pooling square with weights. We want, however, 
to know whether a weighted sum of y and z will give a still 
higher combined correlation with x. With sufficient 
patience, we could answer this by trial and error, for the 
pooling square enables us to find almost as easily the 
correlation of a weighted battery with the criterion.* Let 
us, for example, try the battery 3y + * For this purpose 

* The pooling square can also be used to find the correlations or 
covariances of weighted batteries with one another. Elegant 
developments are Hotelling's ideas of the most predictable criterion 
(1085a) and of vector correlation (1936). 
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we write the weights along both margins of the pooling 
square : 





1-0 


7 


5 


3 


7 


1-0 


.3 


1 


5 


3 


1-0 



and multiply both the rows and the columns by these weights 
before forming the sums A, B, and C. The result of the 
multiplications in our case is : 



1-0 


2-1 


5 


2-1 


9-0 


9 


5 


9 


1-0 



1-0 



2-6 



and we therefore have 

correlation = 



2-6 



2-6 



= -757 



11-8 



VII -8 

a higher value than *744 given by the simple sum. So we 
have improved our estimation of the man's x score, and 
estimates made by taking By + would correlate *757 
with the measured values of x. 

5. Regression coefficients and multiple correlation. 
Similarly we could try other weights for y and z and search 
by trial and error for the best. There is, however, a general 
answer to this question, namely that the best weights for 
y and z are proportional to certain minor determinants of 
the correlation matrix. The weight for y is proportional to 
the minor left when we cross out the criterion column and 
the y row, the weight for z is proportional to minus the 
minor left when we similarly cross out the criterion column 
and the z row. The matrix of correlations with the 
criterion column deleted being: 



7 

1-0 

3 



5 

3 

1-0 
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the weight for y is therefore proportional to : 

= -55 



= -29 



and that for z is proportional to : 

7 -5 
1-0 -3 

that is, they are as -55 : -29. To make these weights not 
merely proportional but absolute values we must divide 
each of them by the minor left when the row and column 
concerned with the " criterion " x are deleted, namely : 

1-0 -3 
3 1-0 



= -91 



so that these absolute best weights, for which the technical 



name s 



or 



regression coefficients," are 
-55 -29 

- V -\ -- Z 

9I y ^ -91 



-6044z/ + -3187* 

We are inviting the reader to take this method of calculat- 
ing the regression coefficients on trust ; but he can at least 
satisfy himself that when applied to the pooling square they 
give a higher correlation of battery with criterion than any 
other weights do. The result of multiplying the y column 
and row by -6044, and the z column and row by -3187, is 
the following : 

6044 -3187 



6044 
3187 



1-0 


7 


5 


7 


1-0 


3 


5 


3 


1-0 



1-0000 


4231 


1593 


4231 
1593 


3653 
0578 


-0578 
1015 



1-0000 



5824 



5824 



5824 



Multiple correlation 



5824 



= -763 = r m , say, which 



V'5824 

is higher than any other weighting will produce, if the reader 
cares to try others. Notice the peculiarity of the pooling 
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square with regression coefficients as weights, that C = B 
(5824 = -5824). We can deduce that the inner product of 
the regression coefficients with the correlation coefficients 
gives the square of the multiple correlation 

604 X -7 + -319 X -5 = -583 = r ro a 

Indeed, we can take this as forming one reason for using 
604 and *319, and not any other numbers proportional to 
them, although the latter would give the same order of 
merit. We want our estimates of x not merely to be as 
highly correlated with the true values of x as is possible, 
but also to be equal to them on the average in the long 
run, in the sense that our overestimations will, in each 
array of men who have the same y and z 9 be as numerous 
as our underestimations, and this is achieved by using not 
merely -55 and 29 as weights, but -55 -f- -91, and *29 ~- -91. 

6. Aitken's method of pivotal condensation. When there 
are more than two tests y and z in the battery, the applica- 
tion of the above rules becomes increasingly laborious. It 
is desirable, therefore, to have a routine method of calcu- 
lating regression coefficients which will give the result as 
easily as possible even in the case of a team of many tests. 
The method we shall adopt (Aitken, 1937a) is based upon 
the calculation of tetrads, as already used in our Chapter II. 
We shall first calculate the above regression coefficients 
again by this method. Delete the criterion column in the 
matrix of correlations, transfer the criterion row to the bottom, 
and write the resulting oblong matrix in the top left-hand 
corner of the sheet of calculations, preferably on paper 
ruled in squares : 

Check 
Column 



A 


(1-0) -3 
3 1-0 

7 -5 


~] _j 


3 
3 
1-2 


B 


(91) 


3 -1 


21 




1-00 
29 


3297 1-0989 

7 


2308 
99 


C 




604 -319 


923 
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On the right of the oblong matrix of correlation coeffi- 
cients we rule a middle block of columns of the same 
number, here two, and on the right of all a check column. 
The columns of the middle block we fill with a pattern 
of minus ones diagonally as shown, leaving the other cells 
empty,* including the bottom row. In the check column 
we write the sum of each row. The top left-hand number 
of all we mark as the " pivot." Slab B of the calculation 
is then formed from slab A by writing down, in order as 
they come, all the tetrad-differences of which the pivot in 
A is one corner. Thus the first row of slab B is calculated 
thus 

IX 1 -3 X -3 = -91 
IX -3 X ( 1) = -3 
1 X ( 1) -3 X = 1 
1 X -3 -3 X -3 = -21 

and the row is checked by noting that -21 is the sum of the 
others. Immediately below this first row a second version 
of it is written, with every member divided by the first 
(91). This is to facilitate the calculation of slab C by 
having unity again as a pivot. The second row of slab B is 
then formed, beginning with 

1 X -5 -T X -3 = -29 

Throughout the whole calculation, except for the division 
of the first row, only one operation needs to be performed, 
namely the computing of tetrad-differences, beginning with 
the pivot. 

The same operation is then repeated to give slab C, 
using the modified first row of B 9 with pivot unity. 

This procedure goes on, slab after slab, until no numbers 
remain in the left-hand block. There being only three 
tests in all in our example, this happens at slab C. The 
middle block then gives the regression coefficients -604 and 
319, with their proper signs, all ready for use. Throughout 
the calculation the check column detects any blunder in 
each row. 

When the number of tests in the battery is large, the 

* The dots represent zeros. 
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calculation of the regression coefficients is a laborious 
business, but probably less so by this method than by 
any other. It will be clear to the reader that so long a 
calculation is not worth performing unless the accuracy of 
the original correlation coefficients is high. Only very 
accurate values can stand such repeated multiplication, 
etc., without giving untrustworthy results (Etherington, 
1932). In other words, regression coefficients have a 
rather high standard error. 

7. A larger example. Next we give in full the calculation 
of the regression coefficients in a slightly larger example, 
though one still much smaller than a practical scheme of 
vocational advice would involve. Here 2 is the "occu- 
pation," and z l9 z 29 3, and # 4 are tests. To give the 
example an air of reality, these and their intercorrelations 
are taken from Dr. W. P. Alexander's experimental study, 
Intelligence, Concrete and Abstract (Alexander, 1985). 
They were * : 

z t Stanford-Binet test ; 

z 2 Thorndike reading test ; 

z 3 Spearman's analogies test in geometrical figures ; 

2 4 A picture-completion test. 

But the occupation is a pure invention, for purposes of this 
illustration only. The correlation matrix is : 

ZQ Zi Z Z #3 #4 



2o 


1-00 


72 


58 


41 


63 


Z l 


72 


1-00 


-69 


49 


39 


Z 2 


58 


69 


1-00 


38 


19 


*3 


41 


49 


38 


1-00 


27 


2 4 


63 


39 


19 


27 


1-00 



The fact that we possess these correlations means that we 
have given these tests to a sufficiently large number of 

* In this, as in other instances where data for small examples are 
taken from experimental papers, neither criticism nor comment is 
in any way intended. Illustrations are restricted to few tests for 
economy of space and clearness of exposition, but in the experiments 
from which the data are taken many more tests are employed, and 
the purpose may be quite different from that of this book. 
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persons whose ability in the occupation is also known. 
The occupation can be looked upon as another test, in 
which marks can be scored. In an actual experiment, 
obtaining marks for these persons' abilities in the occupa- 
tion is in fact one of the most difficult parts of the work. 
We can now find by Aitken's method the best weights for 
Tests z 1 to 4 to make their weighted sum correlate as 
highly as possible with Z Q . To make the arithmetic as easy 
as possible to follow in an illustration, the original correla- 
tion coefficients are given to two places of decimals only, 
and only three places of decimals are kept at each stage of 
the calculation. The previous explanation ought to enable 
the reader to follow. As an additional help, take the 
explanation of the value -153 in the middle of slab Z>. It 
is obtained thus from slab C 

1 X -158 -050 X -106 = -153 

and is typical of all the others. Except for the division 
of each first row, only one kind of operation is required 
through the whole calculation, which becomes quite 
mechanical. The numbers shown on the left in brackets 
are the reciprocals of '524, -757, -826, used as multipliers 
instead of dividing by the latter numbers, in obtaining the 
modified first rows. The process continues until the left- 
hand block is empty, when the regression coefficients 
appear in the middle block (see opposite page).* 

The result is that we find that the best prediction of a 
man's probable success in this occupation is given by the 
regression equation 

Z = -390*! + 2222 2 + -018*3 + -43l2 4 
We give a candidate the four tests, reduce his scores 

* The product of all the unconverted pivots, 1 x -524 x -757 x 
826, is the value -328 of the determinant : 

1-00 -69 -49 -39 

69 1-00 -38 -19 

49 -38 1-00 -27 

39 -19 -27 1-00 

For a different method of finding the regression coefficients, with 
certain advantages, see Addendum, page 350. 
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COMPUTATION OF REGRESSION COEFFICIENTS 
Aitken's Modified Method with Each Pivot converted to Unity 





Check 




(1) 


69 


49 


39 


-1 


1-57 




69 


1 


38 


19 . -1 . ; 1-26 


A 


49 


38 


1 


27 


1 . 1-14 




39 


19 


27 


1 


1 


85 




72 


58 


41 


63 


. 


2-34 


(1 -908) ( 


524) 


042 - 


079 


690 -1 


177 




1 


000 


080 - 


151 


1-317 -1-908 


338 


i 
B 


042 


760 


079 


490 . 1 


371 




079 


079 


848 


390 . . -1 


238 




083 


057 


349 


720 


1-210 


(1-321) 




(757) 


085 


435 -080 -1 


357 








1-000 


112 


575 -106 -1-321 


472 


C 






085 


836 


494 - -151 . -1 


265 








050 


362 -611 -158 


1-182 



(1 

D\ 
E 



-211) 



826) 



1-000 
356 



445 


-160 


112 


-1 


225 


539 

582 


-194 
153 


136 
066 


-1-211 


272 
1-158 


390 


222 


018 


431 


1-061 



Regression Coefficients 



to standard measure by dividing by the known standard 
deviation of each test, insert these standard scores into 
this equation, and obtain an estimated score for him in 
the occupation. Thus the following three young men 
could be placed in their probable order of efficiency in this 
occupation from their test scores : 





Standard Scores 


in 




Tom 


% *, % * 


*, 


7 -2 


- -5 





31 


Dick 


-4 -1 


3 


- -8 


- -47 


Harry 


2 -8 


6 


1-3 


83 
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The multiple correlation of such estimates with the 
true values would be obtained by inserting the four 
correlation coefficients 

72 -58 -41 -63 

instead of the s's in the regression equation, and taking 
the square root, thus 

390 X -72 + -222 X -58 + -018 X -41 + -431 X -63 
= -68847 = r* 



Finally, we can, as we did in the former example, use 
the regression weights on a pooling square and see if we 
obtain this same multiple correlation of r m = -83 : 



390 -222 -018 -431 





1-00 


72 


58 


-390 


72 


1-00 


69 


222 


58 


69 


1-00 


018 


41 


49 


38 


-431 


63 


39 


19 



41 



63 





69 


49 


39 


1-00 


88 


19 


38 


1-00 


27 


19 


27 


1-00 



It will be remembered that we have to multiply each 
row and column by its appropriate weight, and then sum 
all the numbers in each quadrant. The easiest way of 
doing this in large pooling squares is to multiply the rows 
first, then add the columns and multiply the totals by the 
column weights, finally adding these products, thus : 

Multiply the rows : 



Sums 





390 


222 


018 


431 


1-0000 


72 


58 


41 


63 


2808 
1288 
0074 
2715 


3900 
1532 
0088 
1681 


2691 
2220 
0068 
0819 


1911 
0844 
0180 
1164 


1521 
0422 
0049 
4310 


6885 


7201 


5798 


4093 


6302 
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If we had kept all decimals these columnar sums would, 
since we are using regression coefficients as weights, have 
been exactly equal to the top row. With the actual figures 
shown, on multiplying the column totals and adding them, 
we find that the pooling square condenses to : 



1-0000 

6885 



6885 
6885 



6885 QQ , P 

r = - = -83 as before. 

V-6885 

8. The geometrical picture of estimation. Before we close 
this chapter it will be illuminating to consider what esti- 
mation of occupational ability means in terms of the 
geometrical picture of Chapter IV. Consider the illustra- 
tion used in the earlier pages of the present chapter, with 
the matrix : 





X 


y 


z 


X 


1-0 


7 


-5 


y 


7 


1-0 


3 


z 


5 


3 


1-0 



Here x is the criterion, y and z are the tests. Each of 
them can be represented by a line vector, as explained in 
Chapter IV, with angles between these vectors such that 
their cosines are the above correlations. The three vectors 
will then be in an ordinary space of three dimensions. 

The two tests y and z themselves have, of course, vectors 
which lie in a plane : any two lines springing from the 
same point as origin lie in a plane. These are the two 
tests to which we subject the candidate, whose probable 
score in x we are then going to estimate. His two scores 
OY and OZ in y and z enable us to assign to this man a 
point P on the yz plane, a point so chosen that its projec- 
tions on to the y and z vectors give the scores made by 
him in those tests (see Figure 19). But we cannot say that 
this is his point in the three-dimensional space ofx> y, and z. 
His point in that space may be anywhere on a line P'PP* 
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at right angles to the plane yz. For from anywhere on 
that line, projections on to y and z fall on the points Y 
and Z. Yet the projection on to the vector x, which gives 
his score in the criterion test #, depends very much on the 
position of his point on the line P'PP". All the people 
represented by points on that line have the same scores 
in y and z but different scores in x, and our man may be 
any one of them. Before deciding what to do in these 




circumstances, let us consider this set of people P'PP'' in 
more detail. 

It will be remembered that the whole population of 
persons is represented by a spherical swarm of points, 
crowded together most closely round about the origin O, 
and falling off in density equally in all directions from 
that point. Every test vector is a diameter of this sphere, 
and the plane containing any two test vectors divides the 
spherical swarm into equal hemispheres. It follows that 
a line like P'PP" is a chord of the sphere at right angles to 
a diameter (the line OP), and consequently that it is 
peopled symmetrically on both sides of P, both upwards 
along PP' in our figure, and downwards along PP*, the 
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men on the line being most crowded near the point P itself. 
The average man of the array of men P'PP" (who are all 
alike in their scores in the two tests y and z) is therefore 
the man at P, and since we do not know exactly where 
our candidate's point is along P'PP", we take refuge in 
guessing that he is the average man of his group- and is at 
the point P itself. From P, therefore, we drop a perpen- 
dicular on to the vector x 9 and take the distance OX as 
representing his estimated score in that test. This geo- 
metrical procedure corresponds exactly to the calculation 
we made, as a little solid trigonometry will show the 
mathematical reader. The non -mathematical reader must 
take it on trust, but the model may illuminate the calcula- 
tion. In our numerical example, taking the angles whose 
cosines are the correlations, the angle between y and z is 
about 72 1, that between x and z is 60, and that between 
x and y about 46. It is worth the reader's while to draw 
y and z on a sheet of paper on the table, and to represent 
x by a knitting-needle rising at an angle above the table, 
making roughly angles of 46 with y and 60 with z. Any 
point P on the paper represents a person's scores in y and z 9 
scores shared by all persons vertically above and below P. 
The projection of P on to the knitting-needle is rf, the 
estimate. It is the average of all the different scores x 
that a person with scores OY and OZ can have. The 
estimate will only be certain if the knitting-needle itself is 
on the table ; it will be less and less certain, the more the 
knitting-needle is inclined to the table. 

In Section 3 of Chapter IV we noted that the angles 
which three test vectors make with each other are impossible 
angles, if the determinant of the matrix of correlations 
becomes negative. Ordinarily, that determinant is posi- 
tive. In our present example we have, for example : 



1-0 -7 -5 
7 1-0 -3 
5 -3 1-0 



= -38 



Such a determinant, however, though it cannot be 
negative, can be zero, namely in the cases where the two 
smaller angles exactly equal the largest. In that case the 
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three vectors lie in one plane the knitting-needle has 
sunk until it too lies on the table. In that case alone, 
when the determinant is zero, the " estimation " is certain, 
and all the people in the line P'PP" have not only the same 
scores in y and z 9 but also the same scores in x. The 
vanishing of the above determinant therefore shows that 
this is so. And in more than three dimensions, although 
we can no longer make a model, the vanishing of the 
determinant : 

A ^*01 ^*02 ^*03 ^*0/i 

^*01 * ^*12 ^*13 ^"ln 

/*/*"!/ /* 

'02 '12 '23 2*i 

i ^. A sa y> 

^03 ^13 ^23 I ^3 



shows that the criterion Z Q can be exactly estimated from 
the team z l9 z z . . . z n . In fact, the multiple correlation 
r m , which we have already learned to calculate in another 
way, can also be calculated as 



AGO 

where A is the whole determinant, and A o is the minor 
left after deleting the criterion row and column. This 
expression clearly becomes equal to unity when A = 0. 
In our small example x, y, z, we have 



= V 1 - 2i = A/1 = V'5824 = . 



= -38 AOO = ' 91 

.763 

as we already know it to be from page 88. 

9. The " centroid " method and the pooling square. The 
pooling square, which we have learned to use in this 
chapter, enables us to see more clearly the nature of the 
factors first arrived at by Thurstone's " centroid " method. 
It will be remembered that in Chapter II, page 23, in a 
footnote we promised an explanation of this name " cen- 
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troid " (or centre of gravity) method as applied to the 
calculations of factor loadings. 

Let us suppose that the tests Zi, 2 2 z 39 and # 4 have the 
correlations shown, and let us by the aid of a pooling square 
find the correlation of each of them with the average of all. 
This means giving each test an equal weight in pooling it. 



Equal z 2 
Weights z 3 



Equal Weights 

Z 1 Z 2 Z 3 



*4 
r. A 



The correlation of z l with the average of all is then 
obtained from the above pooling square, which condenses 
to: 

r 

' r 19 + r 14 



Sum of all the cells 
of the table of corre- 
lations. 



and the correlation coefficient is 



_ 
Vabove sum 

This, however, is exactly Thurstone's process applied 
to a table with full communalities of unity. The first 
Thurstone factor obtained from such a table is simply for 
each individual the average of his four test scores, and the 
method is called the " centroid " method, because " cen- 
troid " is the multi-dimensional name for an average 
(Vectors, Chapter III; and see Kelley, 1935, 59). The 
vector, in our geometrical picture, which represents the 
first Thurstone factor, is in the midst of the radiating 
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vectors which represent the tests, like the stick of a half- 
opened umbrella among the ribs. It does not, however, 
make equal angles with the test vectors unless these all 
make equal angles with each other. If several of them 
are clustered together, and the others spread more widely, 
the factor will lean nearer to the cluster. 

In the foregoing explanation the communalities have 
been taken as unity, and the factor axis was pictured in 
the midst of the test vectors. If smaller communalities 
are used, the only difference is that a specific component 
of each test is discarded, and the first-factor axis must be 
pictured as in the midst of the vectors representing the 
other components of the tests. It can be shown that when 
communalities less than unity are used, if we bear in mind 
that the communal components of the tests are not then 
standardized, the pooling square gives the correlations 
exactly as -before, if we use communalities instead of units 
in the diagonal. 

The first centroid factor is the average of the communal 
parts of the tests. 

The later factors in their turn are, in a sense, averages 
of the residues. There are, however, some complications, 
the first being that the average of the residues just as they 
stand is zero. The manner in which Thurstone circum- 
vents this has already been described in Chapter II. 

10. Conflict between battery reliability and prediction. 
Weighting a battery of tests, to give maximum correlation 
with a criterion (for example, to give the best prediction of 
future vocational or educational success), will alter the 
reliability of the battery, that is the correlation of the 
weighted battery with a similarly weighted duplicate bat- 
tery. The best weights for prediction will usually differ 
from the weights which give maximum battery reliability, 
so that there is a conflict between two " bests." 

Thomson (19406) has described how to find the best 
weights for battery reliability, as a special case of Hotel- 
ling's " most predictable criterion " (Hotelling, 1935a, and 
see Thomson, 1947, and M. S. Bartlett, 1948), and Peel (1947) 
has given a simpler formula than Thomson's (see page 367 
in the Mathematical Appendix, section 9a). If there are 
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only two tests in the battery, with reliabilities r u , r 22 and 
correlating with another r 12 , then Peel's formula gives as 
the maximum attainable reliability the largest root [i of 
the equation 



~ P >22 - 



= 



that is (r>(l - r l2 2 ) - [z(r n + r 22 - 2r 12 2 ) + (r u r 22 - r 12 2 ) = 0. 
If, for example, r 12 = *5, r u = -7 and r 22 = *8, the quadratic 
has roots '843 and '490, and a battery reliability of '843 
is attainable by using weights proportional to either row of 
the above determinant with (x = -843, taken reversed and 
with alternate signs, that is '0785 and *1431 

or -0431 and -0785 
or 1 and 1-8 approximately. 

If as a check we set out a pooling square for the two bat- 
teries it will be 

1 1-8 1 1-8 



1 
1-8 

1 
1-8 



1-0 
5 



1-0 

5 

8 



7 
5 

1-0 
5 



5 

8 

5 
1-0 



and if we multiply the rows and columns by the weights 
shown, and add together the quadrants, this reduces to 



6-04 



5-092 



5-092 6-04 
giving a battery self-correlation or reliability of- 
5-092 



6-04 



843 as expected. 



Since there is the conflict mentioned above between such 
weights which increase battery reliability, and weights to 
make the battery agree as well as possible with a criterion, 
it would be an advance to possess some reasonably simple 
form of calculation to find weights to make the best com- 
promise (see Thomson, 19406, pages 864-5). 



CHAPTER VII 

THE ESTIMATION OF FACTORS BY REGRESSION 

1. Estimating a man's "g." So far, our discussion of 
estimation in Chapter VI has had nothing immediate to 
do with factorial analysis. We are next, however, going 
to apply these principles of estimation to the problem of 
estimating a man's Spearman or Thurstone factors, given 
his test scores. As we have already explained in Chapter 
V, there is no need to " estimate " Hotelling's factors ; they 
can be calculated without any loss of exactness because 
they are equal in number to the tests : and even if we 
analyse out only a few of them, they can be exactly 
calculated for a man from his test scores. When we say 
exactly here, we mean that the factors are known with the 
same exactness as the test scores which are our data. 

Spearman or Thurstone factors, however, are more 
numerous than the tests, and can therefore only be 
" estimated." Two men with the same set of test scores 
may have different Thurstone factors. All we can do is 
to estimate them, and since the test scores of the two men 
are the same, our estimates of their most probable factors 
will be the same. The problem does not differ essentially 
from the estimation of occupational success or of ability in 
any " criterion " test. The loadings of a factor in each 
test give the z row and column of the correlation matrix. 
Let us first consider the case of a hierarchical battery of 
tests, and the estimation of g, taking for our example 
the first four tests of the Spearman battery used as illustra- 
tion in Chapter I, with these correlations : 





Z l Z 2 Z, Z* 


*1 


1-00 


72 


63 


54 


%2 


72 


1-00 


56 


48 


Z 3 


63 


56 


1-00 


42 


z* 


54 


48 


42 


1-00 




102 


* 





THE ESTIMATION OF FACTORS BY REGRESSION 303 

These correspond, in the analogy with the ordinary cases 
of estimation of the first chapter of this part, to the tests 
given to a candidate. In those cases, however, there was 
a real criterion whose correlations with the team of tests 
were known, and formed the z row and column of the 
matrix. Here the "criterion" isg, and it cannot be 
measured directly ; it can only be estimated in the manner 
we are now about to describe. We have here, therefore, 
no row and column of experimentally measured correlations 
for the criterion z or g in the present case (Thomson, 
1934&, 94). From the hierarchical matrix of inter- 
correlations of the tests, however, we can calculate the 
" saturation " or " loading " of each test with the hypo- 
thetical g, and use these for our criterion column and row 
of correlations. We thus arrive at the matrix : 



Zo * t 2, Z 3 Z, 


1-00 


90 


80 


70 


60 


90 


1-00 


72 


63 


54 


80 


72 


1-00 


56 


48 


70 


63 


56 


1-00 


42 


60 


54 


48 


42 


1-00 



and we want to know the best-weighted combination of 
the test scores %i to 2 4 in order to correlate most highly 
with z = g. The problem is now the same as one of 
ordinary estimation of ability in an occupation, and the 
mathematical answer is the same. We can, for example, 
use Aitken's method of finding the regression coefficients, 
although in this case, because of the hierarchical qualities 
of the matrix, there is, as we shall shortly see, an easier 
method. It is, however, illuminating for the student 
actually to work out the regression coefficients as in an 
ordinary case of estimation, as shown on the next page. 

If, therefore, we know the scores z l9 z 29 2 3 , and # 4 which 
a man has made in these four tests, we can estimate his g 
by the equation (see overleaf) 

g = -55312, + -2595*2 + *16023 3 + -1095* 4 
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(1-00) -72 -63 -54 
72 1-00 -56 48 
63 -56 1-00 -42 
54 -48 42 1-00 
90 -80 -70 -60 


-1-00 
-1-00 
-1-00 
-1-00 


1-89 
1-76 
1-61 
1-44 
3-00 


(2-0764)(-4816) -1064 -0912 


72 -1-00 


3992 


1-0000 -2209 -1894 
1064 -6031 -0798 
0912 -0798 -7084 
1520 -1330 -1140 


1-495 -2-0764 
63 . -1-00 
54 . . -1-00 
90 ... 


8289 
4193 
4190 
1-2994 


(1*7258) (-5796) -0596 


4709 -2209 -1-00 


3311 


1-0000 -1028 
0597 -6911 
0994 -0852 


8124 -3811 -1-7253 
4037 -1894 . -1-00 
6728 -3156 


5712 
3438 
1-1730 


(1-4599) (-6850) 


3552 -1666 -1030 -1-00 


3097 


1-0000 
0750 


5186 -2432 -1504 -1-4599 
5920 -2777 -1715 


4521 
1-1162 




5531 -2595 -1602 -1095 
Regression Coefficients 


1 -0823 



The multiple correlation of such estimates in a large 
number of cases with the true values of g will be by analogy 
with our former case given by 

r m a= -5531 X -90 + -2595 X -80 

+ -1602 X -70 + -1095 X -60 = -883 
r m = -940 

We must remember, however, that such a correlation here 
is rather a fiction. We had in the former case the possi- 
bility of comparing our estimates with the candidate's 
eventual performance in the occupation or criterion 2 . 
Here we have no way of knowing g ; we only have the 
estimates. 

As before, we can check the whole calculation by a 
pooling square, thus : 
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5531 -2595 -1602 -1095 





1-00 


90 


80 


70 


60 


5531 


90 


1-00 


72 


63 


54 


2595 


80 


72 


1-00 


56 


48 


1602 


70 


63 


56 


1-00 


42 


1095 


60 


54 


48 


42 


1-00 



Multiplying by the row weights and summing the 
columns condenses this to : 



5531 -2595 -1602 -1095 



1-000 



883 



90 



80 



70 



60 



900 -800 -700 -600 



and multiplying by the column weights gives : 



1-000 

883 



883 
883 



showing that our calculation was exact to three places. 

Estimating g from a hierarchical battery is therefore, 
mathematically, exactly the same problem as estimating 
any criterion, and can be done arithmetically in the same 
way. Because of the special nature of the hierarchical 
matrix of correlations, however, with its zero tetrad- 
differences, there is an easier way of calculating the estimate 
of g, due to Professor Spearman himself (Abilities, xviii). 
For its equivalence mathematically to the above see 
Thomson (19346, 94-5) and Appendix, paragraph 10. 

Meanwhile we shall illustrate it by an example which 
will at least show that it is equivalent in this instance. 
The calculation is best carried out in tabular form, and is 
based entirely on the saturations or loadings of the tests 
with g, which are also their correlations with g. 
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Regression 


Test 


r * 


V 


j r 2 


V 


r ig 


Coefficients 


10 












v 


r t<r 


\,r iff 














1 + S " 1 - V 


1 


9 


81 


19 


4-2632 


4-7368 


5533 


2 


8 


64 


36 


1-7778 


2-2222 


2596 


3 


7 


49 


51 


9608 


1-3725 


1603 


4 


6 


36 


64 


5625 


9375 


1095 


S = 7-5643 


1 + S = 8-5643 



= -1168 



The result, with much less calculation, is the same. 
The quantity S is of some importance in this formula. It 
is formed in the fourth column of the table, from which 
it will be seen that 



S = 





V 


4- 


1 


r 2 2 


1 


r\ 


2 ' 
'n 


1 ~ 


- r ? 



1 - r ig 



It is clear that S will become larger and larger as the 
number of tests is increased. 

Now, we saw that the square of the multiple correlation 
r m is obtained when we multiply each of the weights by r ig 
and sum the products. That is to say 

r m 2 = Z (weight X saturation) 



1 +S 




S 



1 



This fraction will be the nearer to unity, the larger S is ; 
and we can make S larger and larger by adding more and 
more (hierarchical) tests to the team. Thus in theory we 
can make a team to give as high a multiple correlation 
with g as we desire. It will also be noticed, however, 
from our table that the tests with high g saturation make 
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much the largest contribution to S, and therefore to the 
multiple correlation (see Piaggio, 1933, 89). 

2. Estimating two common factors simultaneously. We 
have seen in the preceding section how to estimate a man's 
g from his scores in a hierarchical team of tests, and in 
this we shall consider the broader question of estimating 
factors in general. Thus in Chapter II the four tests with 
correlations : 



1 


. 


-4 


4 


2 


2 


4 


, 


7 


3 


3 


4 


7 


. 


3 


4 ! 


2 


3 


-3 


t 



were analysed into two common factors and four specifics 
with the loadings (see Chapter II, page 36). 

Common Factors ' 

/ // Specific Factors 

5164 . -8563 

2 -7746 -3162 . -5477 

3 -7746 -3162 . . -5477 

4 -3873 . ... -9220 

Any one column of these loadings can be used as the 
criterion row in the calculation by Aitken's method, and 
the regression coefficients calculated with which to weight 
a man's test scores in order to estimate that factor for 
him. If, as is probable, we want to estimate both common 
factors, we can do the two calculations together, as shown 
at top of next page. Both rows of loadings are written 
below the matrix of intercorrelations, and then pivotal 
condensation automatically gives both sets of regression 
coefficients, with only one extra row in each slab of the 
calculation, as on the next page. 

If, therefore, we have a man's scores (in standard 
measure) in these four tests, our estimate of his Factor I 
will be (see overleaf) 

1787*! + -3932* 2 + -3932z 3 + -1156* 4 
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L-0) 4 4 -2 


-1-0 
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4 1-0 -7 -3 


-1-0 
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4 -7 1-0 -3 


-1-0 


1-4 


2 -3 -3 1-0 


-1-0 
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5164 -7746 -7746 -3873 


. 


2-5429 


3162 -3162 . 


. 


6324 


(84) -54 -22 


40 -1-0 


1-0 


1-00 -6429 -2619 


4762 -1-1905 


1-1905 


54 -84 -22 


40 -1-0 


1-0 


22 -22 -96 


20 . -1-0 


6 


5680 -5680 -2840 


5164 


1-9365 


3162 -3162 . 


. 


6324 


(4928) -0786 


1429 -6429 -1-0 


3571 


1-0000 -1595 


2900 1-3046 -2-0292 


7246 


0786 -9024 


0952 -2619 . -1-0000 


3381 


2028 -1352 


2459 -6762 


1-2603 


1129-0828 


-1506 -3764 


2560 


(-8899 


0724 -1594 -1594 -1-0000 


2811 


1-0000 


0814 -1791 -1791 -1-1237 


3159 


1029 


1871 -4116 -4116 


1-1134 


- -1008 


-1833 -2291 -2291 


1742 




1787 -3932 -3932 -1156 


1-0809 




-1751 -2472 -2472 --1133 


2060 




Regression Coefficients 


1 



and estimates made in this way will have a multiple 
correlation r m with the " true " values of the factor, in a 
number of different candidates, given by 

r m * = -1787 X -5164 + -3932 X -7746 + -3932 X -7746 
+ -1156 X -3873 = -7462 



r m = -864 



Similarly, the multiple correlation of the estimate of the 
second factor with the " true " values can be found to be 

r m = -395 

The two factors are not, therefore, estimated with equal 
accuracy by the team. As with ordinary estimation, the 
whole calculation can be checked by a pooling square. 
This check for the second factor is as follows : 
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1751 -2472 -2472 -1133 





1-0000 
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1-0 


-4 
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2472 
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1-0 
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2472 


3162 


4 


7 


1-0 


3 


1133 


9 


2 


3 


3 


1-0 



Multiplying the rows gives : 

- -1751 -2472 -2472 - -1133 



1-00000 





3162 


3162 





Sums of 
columns 


07816 
07816 


- -17510 - 

09888 
09888 
- -02266 - 


07002 
24720 
17304 
03399 


- -07004 - 
17304 
24720 
- -03399 - 


03502 
07416 
07416 
11330 


15633 





31621 


31621 






Multiplying then by the column multipliers, and adding, 
we get : 



1-00000 
15633 



15633 
15633 



where the equality of the three quadrants shows that our 
regression weights were correct : and the multiple corre- 
lation is V 15633 = * 395 - 

We have now found the regression equations for esti- 
mating the two common factors by treating each in turn 
as a " criterion." It is also possible to estimate a man's 
specific factors in the same way. Indeed, we might have 
written the loadings of the specific factors as four more 
rows below the common-factor loadings in the first slab 
and calculated thfcir regression coefficients all in thfe one 
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calculation. But it is easier to obtain the estimate of a 
man's specific by subtraction (compare Abilities, 1932 
edition, page xviii, line 10). For example, we know that 
the second test score is made up as follows 

Z 2 = -7746/i + -3162/ 2 + -5477s 2 

where /i and / 2 are the man's common factors and s 2 his 
specific. We have estimated his /i and / 2 , and we know 
his 2 2 ; so we can estimate his $ 2 from this equation. The 
estimates of all a man's factors, to be consistent with the 
experimental data, must satisfy this equation and similar 
equations for the other tests. If the estimate of the 
specific is actually made by a regression equation, just like 
the other factors, it will be found to satisfy this require- 
ment.* From the estimates of all a man's factors, there- 
fore, including any specifics, we can reconstruct his scores 
in the tests exactly. From only a few factors, however, 
even from all the common factors, we cannot reproduce 
the scores exactly, but only approximately. 

3. An arithmetical short cut (Ledermann,1938a, 1939). 
If the number of tests is appreciably greater than the 
number of common factors, the following scheme for 
computing the regression coefficients will involve less 
arithmetical labour than the general formulae expounded 
in Chapter VI and applied to the factor problem in this 
chapter.f 

For illustration, we shall use the data of the preceding 
section (page 108), although in that example the number 
of tests (four) exceeds the number of common factors (two) 
only by two, which is too small an amount to demonstrate 

* It is interesting to note that we know the best relative loadings 
of the tests to estimate a specific by regression without needing to 
know how many common factors there are, or whether indeed any 
specific exists or not. (Wilson, 1934. For the same fact in more 
familiar notation, see Thomson, 19860, 43.) 

f This short cut, in the form here given, is only applicable to 
orthogonal factors. For oblique factors, which are described below 
in Chapter XVIII, modifications are necessary in Ledermann's 
formulse, for which see Thomson (1949) and the later part of section 
19/of the Mathematical Appendix^ page 378* 
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fully the advantages of the present method. The common- 
factor loadings and the specifics of the four tests form a 
4x2 matrix and a 4 X 4 matrix respectively, thus : 



5164 

7746 -3162 
7746 -3162 
3873 



8563 



5477 



5477 



9220 



the matrix M Q being identical with the first two columns, 
and the matrix M l with the last four columns of the table 
on page 107. Before the data are subjected to the com- 
putational routine process, which will again consist in the 
pivotal condensation of a certain array of numbers, some 
preliminary steps have to be taken : (i) the loadings of 
each test are divided by the square of its specific, and the 
modified values are then listed in a new 4x2 matrix : 



7042 
2-5820 
2-5820 

4556 



1-0540 
1-0540 



J = 



e.g. 2-5820 = (-7746) -f- (-5477) 2 

1-0540 = (-3162) -7- (-5477) 2 

(ii) Next, the inner products (see footnote on page 81) of 
every column of M in turn with every column of M i are 
calculated and arranged in a 2 X 2 matrix : 

5401 1-63291 
6329 -6665 J 

i.e. the first row of this matrix contains the inner products 
of the first column of M with all the columns of M i, 
similarly the second row of J contains all those inner 
products which involve the second column of M 09 e.g. 

4-5401 = -5164 X -7042 + -7746 

X 2-5820 + -7746 X 2-5820 + -3873 X -4556 
6665 = -3162 X 1-0540 + -3162 X 1-0540 

If there had been r common factors the matrix J would 
have been an r X r matrix. The arithmetic is simplified 
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by the fact that J is always symmetrical about its diagonal, 
so that only the entries on and above (below) the diagonal 
need be calculated, (iii) Finally, each element on the 
diagonal of J is augmented by unity, giving, in the notation 
of matrix calculus, the matrix : 



5-5401 
1 -6329 



1-6329 
1 -6665 



This matrix is now " bordered " below by the matrix 
MOI, and on the right-hand side by a block of minus ones 
and zeros in the usual way. The process of pivotal 
condensation then yields the same regression coefficients 
as were obtained on page 108. 

6-1730 



5-5401 1-6329 



1-0000 
1-6329 

7042 
2-5820 
2-5820 

4556 



2947 
1-6665 

1-0540 
1-0540 



1-1853 



-1-0000 
- -1805 



-1-0000 



2947 - 1 -0000 



1-1142 
2-2994 

7042 
3-6360 
3-6360 

4556 



4800 



1-0000 


2486 


-8437 i 


4050 


- -2075 


1271 




- -0804 


2931 


4661 




7591 


2931 


4661 


! 


7591 


-1343 


0822 




- -0520 




1787 


- -1751 


0036 


efficients 


3932 


2473 


6404 




3932 


2473 


6404 




1156 


- -1133 


0023 



4. Reproducing the original scores. Let us imagine a 
man who in each of the four tests in our example obtains 
a score of + 1 ; that is, one standard deviation above the 
average. We choose this set of scores merely to make the 
arithmetic of the example easy. The regression estimates 
of his two common factors are 

/! = -17872! + -3932*2 + -3932* 3 + -1156*4 
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Inserting his scores z t = z z = % 3 = 4 = 1 into these 
equations we get for the regression estimates of his factors 

/! = 1-0807 
/ 2 = -2060 

that is, we estimate his first factor to be rather more than 
one standard deviation, his second factor to be about 
one-fifth of a standard deviation, above the average. 

Now, the specification equations which give the composi- 
tion of the four tests in terms of the factors are 
z v = -5164/i . + -8563$! 

Z 2 = -7746/! + -3162/2 + -5477$ 2 
* 3 = -7746/i + -3162/2 + -5477s 3 
s 4 = -3873/j . + -9220s 4 

If we insert the above estimates/! and/ 2 in lieu of /j and 
/ 2 , we get for this man's scores 

*! == -5581 + -8563^ 
z 2 = -9022 + -5477^2 
z a = -9022 + -5477^ 
^ 4 = -4186 + -9220^ 

We know his four scores each to have been + 1, and 
if we had also worked out the estimates of his specifics 
by the regression method we should have found that they 
added just enough to the above equations to make each 
indeed come to + 1. We can, therefore, find his estimated 
specifics more easily from the above equations, as in this 
case 

1 -5581 



-8563 
-9022 
""5477 



= -5161 
= -1786 



and so for s 3 and $ 4 , subtracting the contribution of the 
common factors from the known score (here + 1 in each 
case) and dividing by the specific loading. 

The regression estimates of the factors, made by the 
system we have so far been considering, are as a matter 
of fact not the only estimates which have been proposed. 
The alternative system has certain advantages, to be 
explained later. The regression estimates are thfc best in 
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the sense, as we said when deducing them, that they give 
the highest correlation, taken over a large number of men, 
between the estimates and the true values of a criterion 
when the latter can be separately ascertained. Just what 
this correlation means, however, when there is no possibility 
of ascertaining the " true " values (for factors, when they 
outnumber the tests, only can be estimated) it is not so 
easy to say. 

The regression estimates of the factors, as calculated in 
the present chapter, have one other great advantage, that 
they are consistent with the ordinary estimation of voca- 
tional ability made without using factors at all, as can 
best be shown by means of the example of Section 7 of 
Chapter VI. 

5. Vocational advice with and without factors. In that 
example we had an " occupation " z , and four tests 
%i9 #2> 3? and 2 4 ; and in Chapter VI, without using factors 
at all, we arrived at the following estimation of a man's 
success or " score " in the occupation (which is, after all, 
only a test like the others, though a long-drawn-out one) 
Z = -3902! + -222s 2 + -OlSSg + -43l2 4 

Now let us suppose that the matrix of correlations of 
these five tests (including the occupation as a test) had 
been analysed, by Thurstone's method or any other, into 
common factors and specifics the matrix is given in 
Chapter VI, page 91. Indeed, the four tests proper were 
so analysed by Dr. Alexander in the monograph from which 
we took their correlations, and the analysis below is based 
on his. The " occupation " s is a pure fiction made for 
the purpose of this illustration, but we can easily imagine it 
also being analysed in exactly the same way as a test. 
The table of loadings of the factors, to which we may as 
well give Dr. Alexander's names of g (Spearman's g), v (a 
verbal factor), and jP (a practical factor), is as follows : 

g v F Specific 

Occupation Z -55 -45 -60 -37 

Stanford-Binet z^ -66 -52 -21 -50 

Reading test z 2 -52 -66 . -54 

Geometrical analogies z 3 -74 . . -67 

Picture completion z 4 - -37 . '71 -60 
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With this table of loadings "in our possession we might 
have given vocational advice to a man in a roundabout 
way. Instead of inserting his scores in z l9 * 2 , * 3 , and * 4 in 
the equation (see page 93). 

* = -390*! + -222*2 + -018*3 + -431* 4 

we might have estimated his factors g 9 v 9 and F from his 
scores in the four tests, and then inserted these estimated 
factors in the specification equation of the occupation 

2 = *55g + *45z; + *6QF ~\- -37s 

(ignoring the specific s Q9 which cannot be estimated from 
2i 5 z 29 * 3 , and * 4 ). Had we done so 9 we should have arrived 
at exactly the same numerical estimate of his * as by the 
direct method (Thomson, 1936a, 49 and 50). 

The actual estimation of the factors g 9 v 9 and F from the 
four tests will form a good arithmetical exercise for the 
student. The beginning and end of the calculation of the 
regression coefficients is shown here, following exactly the 
lines of the smaller example on page 108 of this chapter : 

| Check 

-1 . . . 1-57 

-1 . . 1-26 

-1 . 1-14 

1 -85 

! 2-29 
1-18 

21 . . -71 . . . . -92 

This reduces by pivotal condensation step by step to the 
three sets of regression coefficients : 

forg -300 -095 -532 -095 

for v -353 -581 -352 -153 

for P ] -121 - -148 206 -747 

The result is to give us three equations for estimating 
g 9 v 9 and F from a man's scores in the four tests, viz. 

g = -300*! + -095*2 + -532*3 + -095* 4 

v = -353*! + -581*3 -352*3 -153* 4 

F = -121*! -148*2 -206*3 + -T47* 4 

Now let us assume a set of scores z l9 * 2 , * 8 , * 4 for a man, 
and see what the estimate of his occupational ability is by 



1-00 


69 


-49 


39 


69 


1-00 


38 


19 


49 


38 


1-00 


27 


-39 


19 


27 


1-00 


66 


52 


74 


37 


-52 


-66 


. 


. 



116 THE FACTORIAL ANALYSIS OF HUMAN ABILITY 

the two methods, the one direct without using factors, the 
other by way of factors. Suppose his four scores are 

*1 ^2 2s *4 

2 -4 -7 -6 
The estimates of his factors g, v, and F will therefore be 

| = -300 X -2 + -095 x (- -4) + -532 X -7 + -095 x -6 = -451 
V = -353 X -2 + -581 X ( -4) -352 X -7 -153 X -6 = -500 
F = -121 X -2 - -148 X (- -4) - -206 X -7 + -747 x -6 = -387 

If now we insert these estimates of his factors into 
the specification equation of the occupation, ignoring its 
specific, we get for our estimate of his occupational success : 

z<> = -55 X -451 + -45 X ( -500) + -60 X -387 = -255 

that is, we estimate that he will be about a quarter of a 
standard deviation better than the average workman. 
This by the indirect method using factors. 

By the direct method, without using factors at all, we 
simply insert his test scores into the equation 

= .390*! + -222*0 + -018*3 + -431*4 
and obtain 

g = -390 X -2 + -222 X ( -4) + -018 X -7 + -431 X -6 
= -260 

exactly the same estimate as before for the difference in 
the third decimal place is entirely due to "rounding off" 
during the calculation. The third decimal place of the 
direct calculation is more likely to be correct, since it is 
so much shorter. 

6. Why, then, use factors at all ? The reader may now 
ask, " What, then, is the use of estimating a man's factors 
at all ? " Well, in a case analogous to that of the present 
example, it is quite unnecessary to use factors at all, and 
there is no doubt that a great many experimenters have 
rushed to factorial analysis with quite unjustifiable hopes 
of somehow getting more out of it than ordinary methods 
of vocational and educational advice can give without 
mentioning factors. But we must not go to the other 
extreme and " throw out the baby with the bath- water." 
There may be other reasons for using factors, aj>art from 
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vocational advice. And even in giving such advice, which 
really means describing men and occupations in similar 
terms, so that we can see if they fit one another or not, it 
may be that factors have some advantages not disclosed 
by the above calculation. 

This man whom we have used above, for example, may 
be described either in terms of his scores in four fairly 
well-known tests, or in terms of the factors g, v 9 and F. 
By the former method his description is : 

Stanford-Binet test -2, slightly above average 

Thorndike reading test -4, distinctly below average 

Spearman's geometrical 

analogies . . . -7, good 

Picture-completion test 6, good 

This description already suggests to us that he is a man of 
average intelligence or rather better, of not much schooling, 
and with a bit of a gift for seeing shapes, and similarities in 
them. From the correlations of the occupation with these 
four tests we know that it most resembles the first and last 
tests and least resembles the third. We can probably 
draw the conclusion that this man will be above average 
in it ; and we can draw this conclusion accurately if we 
calculate the regression equation 

z<> = -390*! + -222*2 + -01823 + -43l2 4 
As a description of the man, however, the above table 
suffers from the fact that the four tests are correlated with 
one another. We feel a certain clarity in the description 
in terms of factors, because these are independent of one 
another and uncorrelated. This man whom we are at 
present considering is alternatively described, in terms of 
factors, as : 

Factor Estimated Amount 

g -451 

v -500 

F -387 

that is, a quite intelligent (g) and practical (F) man with, 
however, not much ability in using and understanding 
words (v). There is a certain air of greater generality 
about the factors than there is about the particular tests 
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from which they have been deduced, and v they give 
definition and point to mental descriptions, or at least they 
seem to do so. 

Yet some of these " advantages " of using factors begin 
to look less bright when looked into more carefully. We 
said that one advantage is that factors are independent 
and un correlated. So they are, if their true values are 
known. But we only know their estimates, and these are 
correlated, as we shall illustrate shortly. If we use factors 
it is clear that we must, if we value the advantage of 
independence, seek to obtain estimates which are as little 
correlated with one another as possible. There have been 
proposals to use factors which are really correlated ; not 
merely correlated when their estimates are taken, but 
correlated in their true measures. What advantage can 
these have over the actual correlated tests ? The funda- 
mental advantage hoped for by the factorist seems to be 
that the factors (correlated or un correlated) may turn out 
to be comparatively few in number, and may thus replace 
a multitude of tests and innumerable occupations by a 
description in these few factors. The student whose 
knowledge of the subject is being obtained from this book 
is not yet equipped to discuss adequately the very funda- 
mental questions raised in this section, to which we shall 
return several times in later chapters. One last point in 
favour of factors may, however, be expanded somewhat 
here. We said a couple of sentences back that factorists 
hope to give adequate descriptions of men and of occupa- 
tions in terms of a comparatively small number of factors. 
This, if achieved, would react on social problems somewhat 
in the same way as the introduction of a coinage influences 
trade previously carried on by barter. A man can ex- 
change directly five cows for so many sheep, so much 
cloth, and a new ploughshare ; but the transaction is 
facilitated if each of these articles is priced in pounds, 
shillings, and pence, or in dollars and cents, even though 
the end result is the same. And so perhaps with the 
" pricing " of each man and each occupation in terms of a 
few factors. 

But the prices must be accurate ; and the analyses of 
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tests and occupations into factors, still more the calculation 
of quantitative estimates of these factors, are as yet very 
inaccurate, and perhaps are inherently subject to uncer- 
tainty. A fluctuating and doubtful coinage can be a 
positive hindrance to trade, and barter may be preferable 
in such circumstances. 

We showed in Section 5 above that a direct regression 
estimate of a man's ability in an occupation gives identically 
the same result as an estimate via the roundabout path of 
factors, so that at least when the direct regression estimate 
is possible there can be no quantitative advantage in using 
factors. When, however, is the direct regression estimate 
possible, and when is it impossible ? 

To make the direct regression estimate we require the 
complete table of correlations of the tests with one another 
and with the occupation, and we have to know the candidate's 
scores in the tests. This implies that these same tests have 
been given to a number of workers whose proficiency in the 
occupation is known, for otherwise we would not know the 
correlations of the tests with the occupation. Under these 
ideal circumstances any talk of factors is certainly unneces- 
sary so far as obtaining a quantitative estimate is concerned. 

But suppose these ideal conditions do not hold ! These 
tests which we have given to the candidate have never 
been giveij, at any rate as a battery, to workers in the 
occupation, and their correlations with the occupation are 
unknown ! This situation is particularly likely to arise in 
vocational advice or guidance as distinguished from 
vocational selection. In the latter we are, usually on 
behalf of the employer, selecting men for a particular job, 
and we are practically certain to have tried our tests on 
people already in the job, and to be in a position to make 
a direct estimation without factors. But in vocational 
guidance we wish to gauge the young person's ability in 
very many occupations, and it is unlikely that just this 
battery of tests that we are using has been given to workers 
in all these different jobs. In that case we cannot make a 
direct regression estimate of our candidate's probable 
proficiency in every occupation. Can we, then, obtain an 
estimate in any other way ? 
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Other ways are conceivable, but it must at the outset 
be emphasized that they are bound to be less accurate than 
the direct estimate without factors. Although this battery 
of tests has not been given to workers in the occupation, 
perhaps other tests have, and by the aid of that other 
battery a factor analysis of the occupation has perhaps 
been made. If our tests enable the same factors to be 
estimated, we can gauge the man's factors and thence 
indirectly his occupational proficiency. Unfortunately, 
the " if " is a rather big one. Are factors obtained by 
the analysis of different batteries of tests the same factors ; 
may they not be different even though given the same 
name ? We shall discuss this very important point later, 
but meanwhile let us suppose that we have reasonable 
confidence in the identity of factors called by the same 
name by different workers with different batteries. Then 
the probable course of events would be something like this. 
An experimenter, using whatever tests he thinks practicable 
and suitable, analyses an occupation into factors. Another 
experimenter, at a different time and place, is asked to 
give advice to a candidate for that occupation. Using 
whatever tests he in his turn has available, he assesses in 
this candidate the factors which the previous experimenter's 
work leads him to think are necessary in the occupation, 
and gives his advice accordingly. The factors have played 
their part as a go-between, like a coinage. All depends on 
the confidence we have in the identity of the factors. We 
shall see later that there is only too much reason to think 
that the possibility of this confidence being misplaced has 
hardly been sufficiently realized by many over-enthusiastic 
factorists. And even if the common factors are identical, 
there remains the danger that the " specific " of the occu- 
pation may be correlated with some of the " specifics " 
of the tests, a fact which cannot be known unless the same 
tests have been given to workers in the occupation. 

7. The geometrical picture of correlated estimates. Of the 
swarm of difficulties and doubts raised by these remarks 
we shall choose one to deal with first. We said that even 
although we make our analysis of the tests we use into 
uncorrelated factors, the estimates of these factors will be 
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correlated. This can best be appreciated if we consider 
what the estimation of factors means in terms of the 
geometrical picture of Chapter IV, which we also used in 
Chapter VI, Figure 19 (page 96). In this latter figure 
we were illustrating the straightforward process of esti- 
mating a " criterion " x, given a man's scores in two tests 
y and z. We saw that these two scores did not tell us 
exactly the man's position in the three-dimensional space 
of x, y, and z, but only told us that he stood somewhere 
along a line P'PP" at right angles to the plane of yz. In 
default of his exact point, we took the point P, which is 
where the average man of the array P'PP" stands, and by 
projection from it on to the vector x found an estimate 
OX of his x score. 

Exactly the same picture will serve for the estimation 
of a factor, if we suppose the vector x to be now the vector 
of a factor (say a) whose angles with y and z are known 
for the loadings of y and z with a are their cosines. 

Now, suppose that we are referring these two tests y and 
z to three uncorrelated factors. It is immaterial whether 
any of these factors are specifics, for a specific is estimated 
exactly like any other factor. We shall call them simply 
a, b, and c. Since the three factors are uncorrelated, they 
are represented in the geometrical picture by orthogonal 
(i.e. rectangular) axes, as shown 
in Figure 20. The vectors a and 
b are at right angles to each 
other in the plane of the paper, 
while the vector c is at right 
angles to both of them, standing 
out from the paper. These axes 
are to be imagined as continued 
backwards in their negative 
directions also, but only their 
positive portions are shown, to Figure 20. 

avoid confusing the diagram. 
The vectors y and z, also shown only in their positive 
directions, represent the two tests, and the angle between 
them represents by its cosine their correlation with 
one another. These two vectors y and z are not in 
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any of the planes ab, be, or ca, but project into the space 
between them. 

The three orthogonal planes ab, be, and ca divide the 
whole of three-dimensional space into eight octants, and if 
as is usual the final positions chosen for a, b, and c are 
such that all loadings are positive, the positive, directions 
of y and z will project into the positive octant as shown 
in the figure, in which the vector z is coming out of the 
paper more steeply than y is. 

The two vectors Oy and Oz define a plane, on which a 
circle has been drawn, which in the figure appears as an 
ellipse, since the plane yz is not in the paper but inclined 
to it. 

In the three-dimensional space defined by abc, the 
population of all persons is represented by a spherical 
swarm of points dense at O, more sparse as the distance 
from O increases in any direction. From any point in 
this space, perpendiculars can be dropped to a, b, c, y, and z, 
and the distances from O to the feet of these perpendiculars 
represent the amount of the factors a, b, and c possessed 
by the person whom that point represents, and his scores 
in the two tests. Conversely, a knowledge of his three 
factors would enable us to identify his point by erecting 
three perpendiculars and finding their meeting-point. But 
a knowledge of his scores in y and z does not enable us to 
identify his point, but only to identify a line P'PP", 
anywhere on which his point may lie. In the figure, let 
OY and OZ represent a person's scores in y and z. Then 
on the plane yz we may draw perpendiculars meeting at P. 
But the point representing the person whose scores are 
OY and OZ need not be at P; it can be anywhere in 
P'PP" at right angles to the plane yz, for wherever it is 
on this line, perpendiculars from it on to y and z will fall 
on the points Y and Z. In estimating factors from tests 
we have to choose one point on P'PP" from which to 
drop perpendiculars on to a, b, and c, and we choose P 
because the man at P is the average man of the array of 
men P'PP". Thus when we are estimating factors, all 
our population is represented by points on the plane yz 
(the plane on which in the figure the circle is drawn which 
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looks like an ellipse), although really they should be 
represented by a spherical swarm of dots. 

When the population is truly represented by its spherical 
swarm of dots, the axes a, b, and c represent uncorrelated 
factors. But when the spherical swarm of dots has been 
collapsed or projected on to the diametrical plane yz this 
is no longer the case. By taking only points in the plane 
yz from which to estimate factors in a three-dimensional 
space we have passed as it were from the geometrical 
picture of Chapter IV to the geometrical picture used in 
the first portion of Chapter V, where correlation between 
rectangular axes was indicated by an ellipsoidal distribu- 
tion of the population points. We have introduced 
correlation between the estimates of a, b, and c, because 
we have distorted the distribution of the population from 
a three-dimensional sphere to a flat circle on the plane yz, 
that is to an ellipsoid, for in a space of three dimensions 
the circle is an ellipsoid with two axes equal and the third 
one zero. Consider, for example, the particular point P 
shown in the figure. From it, projections on to a, b, and c 
are all positive, the man with scores OY and OZ in y and z 
is estimated to have all three factors above the average, 
which adds to their positive correlation. But in actual 
fact, since P may really lie anywhere along P'PP", a line 
which does not remain for its whole length in the positive 
octant abc 9 the man may really have some of his factors 
positive and some negative. 

If, together with the population, the rectangular axes 
a, b, and c are also projected on to the plane yz, these 
projections will not all be at right angles obviously they 
cannot, for three lines in a plane cannot all be at right 
angles to one another. The angles between these projections 
of the factor axes on to the test plane represent the correla- 
tions between the estimated factors. 

Our illustration has been only in two and three dimen- 
sions, for clearness and to permit of figures being drawn. 
Similar statements, however, are true of more tests and 
more factors, where the spaces involved are of dimensions 
higher than three. If there are n tests, the n test vectors 
define an n-space, analogous to the yz plane of Figure 20, 
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If these n tests have been analysed into r common factors 
and n specifics, n + r factors in all, the factor axes will 
define an (n + f) space analogous to the three-dimensional 
abc space of Figure 20. A man's n scores in the tests 
define his position P in the n-space of the tests, but he 
may be anywhere in a space P'PP", of r dimensions, at 
right angles to the test space, analogous to the line P'PP" 
in Figure 20. We take the point P to represent him 
faute de mieux, and project the distance OP on to the factor 
axes to get his estimated factors. These estimated factors 
are correlated with one another, and if we project the 
n + r factor axes from the (n + r)-space on to the n-space 
of the tests, the angles between these shadow vectors 
represent the correlations between the estimates. 

8. Calculation of correlation between estimates. Arith- 
metically, these correlations are easily calculated from the 
inner products of (b) 9 the loadings of the estimated factors 
with the tests (page 115), with (a), the loadings of the 
tests with the factors (page 114). Moreover, this gives us 
the opportunity to explain in passing what is meant by 
" matrix multiplication." 

The matrix of loadings of the four tests with the three 
common factors is (page 114) : 



M = 



66 
52 

74 
37 



52 
66 



21 



71 



and the matrix of the loadings of the three estimated 
factors with the four tests is (page 115) : 

300 -095 -532 -095 
353 -581 -352 -153 

121 -148 -206 -747 

i 

Then the matrix of variances and covariances of the 
estimated factors is 

jr 



in which formula we must explain how we form the 
product of two matrices. By the product of two matrices 
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we mean the new matrix formed of the inner products of 
the rows of the left-hand matrix with the columns of the 
right-hand matrix, set down in the order as formed. 
Thus, in forming the product : 



300 -095 -532 -095 
353 -581 -352 -153 
121 -148 --206 -747 



676 -219 -130 
218 -567 -034 
127 -035 -556 



66 
52 
74 

37 



52 
66 



21 



71 



K 



the first element *676 of K is the inner product of the first 
row of N with the first column of M 
300 X -66 + -095 X -52 + -532 

X -74 + -095 X -37 == -676 

In the same way, every element in K is formed. The 
element -034, in the second row and third column of K, 
is the inner product of the second row of N with the third 
column of M 

353 X -21 + -581 X zero -352 

X zero -153 X -71 = -034 

If our arithmetic throughout the whole calculation of 
these loadings had been perfectly accurate, the matrix K 
would have been perfectly symmetrical about its diagonal. 
The actual discrepancies (as -127 and -130) are a measure 
of the degree of arithmetical accuracy attained.* 

The matrix K thus arrived at gives by its diagonal 
elements '676, -567, and *556, the variances of the three 
estimated factors (that is, the squares of their standard 
deviations), and by its other elements their covariances in 
pairs (that is, their overlap with one another). The 
correlation of any two estimated factors is equal to (see 
Chapter I, Figure 2) 

* A trial will show the reader that the product NM is quite 
different from the product MN. This is the only fundamental 
difference between matrix algebra and ordinary algebra. 
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co variance (ij) 
13 V variance (i) x variance (j) 

From K we can therefore form the matrix of correlations 
of the estimated factors. It is : 



1-000 -353 -212 
353 1-000 -061 
212 -061 1-000 



wherein -353, for example, is -219 ~ -y/O 676 X * 567 )- 
Although, therefore, the " true " factors g and v are un- 
correlated, their estimates ^ and v are correlated to an 
amount -353. The "true" factors g, v, and F are in standard 
measure, but their estimates g, v, and F have variances of 
only -676, -567, and -556 instead of unity. These variances, 
be it noted in passing, are equal also to the squares of the 
correlations between g and , v and v, F and P. 

Not only are the estimates of the common factors 
correlated among themselves ; they are correlated with 
the specifics, so that the estimates of the specifics are not 
strictly specific. As a numerical illustration we may take 
the hierarchical matrix used in Section 1, pages 102 ff., 
four tests of the larger hierarchical matrix used in Chapters 
I (page 6) and II (page 23). 



1-00 


72 


63 


54 


72 


1-00 


56 


48 


63 


56 


1-00 


42 


54 


48 


-42 


1-00 



The regression estimate of g from this battery is, as we 
found on page 104) 

= -553% + -259*2 + -160*3 + '109* 4 

The regression estimates for the four specifics can also 
be found, either by a full calculation like that of page 
108, or by the simpler method of subtraction of page 
110. Thus, to estimate s x in our present example we 
know that 
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= -9 + -486* 

Also we know that the estimates g and 
same equation 



will satisfy the 



that is 



s = 

1 



-436 



On inserting the expression for g into this we get 

$! = 1-1522! -535*2- -33323- -225*4 
and similarly 

s 2 = -737*! + 1'3133 2 -215*3 '145* 4 
3 = -542* x -253* 2 + 1 -242*3 '106* 4 
S 4 = -415*! -194*2 -121*3 -|- 1-169* 4 

We have now both N, the matrix of loadings of the 
estimated factors , s i9 s 29 s S9 ^ 4 with the four tests, and 
M, which we already know, the matrix of loadings of the 
four tests with the five factors g, s l9 s 29 s s , and $ 4 , namely : 



M = 



9 

8 
7 
6 



436 



600 



714 



-800 



From their product NM we obtain the matrix K of 
variances and covariances of the estimated factors, namely : 



553 -259 -161 -109 
1-152 -535 -333 -225 

- -737 1-313 - -215 - -145 

- -542 - -253 1-242 - -106 
-415 - -194 - -121 1-169 



9 

8 
7 
6 



-436 



-600 



714 



800 



880 -241 -155 -115 -087 
241 -502 - -321 - -238 - -180 
150 - -321 -788 - -154 - -116 
116 -236 -152 -887 -085 
088 -181 -116 -086 -935 



= K 
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Again, we have a check on the accuracy of our arith- 
metic, for K will, if we have been accurate, be exactly 
symmetrical about its principal diagonal, i.e. its diagonal 
running from north-west to south-east. The largest dis- 
crepancy in our case is between -150 and -155. Moreover, 
since in this case K includes all the factors, we have another 
check which was not available when we calculated a K for 
common factors only : the sum of the elements in the 
principal diagonal (called the " trace," or in German the 
" Spur ") here must come out equal to the number of tests. 
In our case we have 

880 + -502 + -788 + -887 + -935 = 3-992 

and there are four tests. These elements which form the 
trace of K are, it will be remembered, the variances of the 
estimates ^, s i9 s 2 , s s , and $ 4 . So that we see that the total 
variances of the five factors is no greater than the total 
variance (viz. 4) of the four tests in standard measure. 
This is only another instance of the general law that we 
cannot get more out of anything than we put into it (at 
any rate, not in the long run). 

From K we can at once calculate the correlation of the 
estimated factors. Adjusting the slight arithmetical de- 
partures from symmetry, we get : 



1-000 -362 -184 -131 -096 

362 1-000 -510 -354 -263 

184 -510 1-000 -183 -135 

131 -354 -183 1-000 -094 

096 -263 -135 -094 1-000 

from which we see that is correlated with each of the 
estimated specifics positively, while the latter are correlated 
negatively among themselves, in this (a hierarchical) 
example. 

We have then this result, that although we set out to 
analyse our battery of tests into independent uncorrelated 
factors, the estimates which we make of these factors are 
correlated with one another, and instead of being in 
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standard measure have variances, and therefore standard 
deviations, less than unity. We could, of course, make 
them unity by dividing all our estimates by their calculated 
standard deviation. But that would make no change in 
their correlations. 

The cause of all this is the excess of factors over tests, 
and consequently this drawback the correlation of the 
estimates depends upon the ratio of the number of factors 
to the number of tests. The extra factors are the common 
factors, for there is a specific to each test, and therefore 
with the same number of common factors the correlation 
between the estimates will decrease as the number of tests 
in the battery increases. Just as in the hierarchical case 
one of the tasks of the experimenter is to find tests to add 
to the number in his battery without destroying its hier- 
archical nature, so in the case of a Thurstone battery 
which can be reduced to rank 2, 3, 4 ... or r, a task 
will be to add tests to the battery which with suitable 
communalities will leave the rank unchanged and the pre- 
existing communalities unaltered, in order that the common 
factors may be the more accurately estimated, and the 
estimates be more nearly uncorrelated. 

With Thurstone batteries of tests, therefore, we arrive 
at the same necessity to " purify " any extended battery 
as we spoke of in Chapter II, Section 1, in the hierarchical 
case. Indeed, the need will be greater, for larger batteries 
will be required to reach the same accuracy of estimation 
with more extra factors. 
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CHAPTER VIII 



MAXIMIZING AND MINIMIZING THE SPECIFICS 

1. A hierarchical battery. In Section 3 of Chapter III a 
brief reference was made to the fact that the Spearman 
Two-factor Method, and Thurstone's Minimal Rank 
Method, of factorizing batteries of tests maximize the 
variance of the specific factors, by reason of minimizing 
the number of common factors. In the present chapter 
we shall inquire further into this aspect, and describe a 
method of estimating factors (Bartlett, 1935, 1937), which 
in its turn endeavours to minimize the specifics again. 
First take the case of the analysis of a hierarchical battery. 
As was illustrated in Chapter III, the analysis of such a 
battery into one general factor only, and specifics, gives 
the maximum variance possible to the specifics. The 
combined communalities of the tests are less in the two- 
factor analysis than in any other analysis. In the matrix 
of correlations after it has been reduced to the lowest 
possible rank, the communalities occupy the principal 
diagonal : 



7*24 
7*34 



7*24 



7*34 



The mathematical expression of the above fact is that the 
trace of the reduced correlation matrix, i.e. the sum of the 
cells of the principal diagonal, is a minimum. 

It ,is true that certain exceptions to this statement are 
mathematically possible, but their occurrence in actual 
psychological work is a practical impossibility. They have 
been investigated by Ledermann (Ledermann, 1940), who 
finds, in the case of the hierarchical matrix, that an excep- 

130 
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tion is only possible when one of the g saturations is greater 
than the sum of all the others. When the battery is of 
any size, this is most unlikely to occur : and almost always, 
when it did occur, the large saturation of one test would 
turn out to be greater than unity, which is not permissible 
(the Hey wood case).* 

2. Batteries of higher rank. The same general statement 
as the above, that the specifics are maximized, is also true 
of Thurstone's system, of which its predecessor (Spearman's 
two-factor system) is a special case. The communalities 
which give the matrix its lowest rank are in sum less than 
any other diagonal elements permissible. If numbers 
smaller than the Thurstone communalities are placed in 
the diagonal cells, the analysis fails unless factors with a 
loading of V~~ * are employed (Vectors, page 103), and 
such factors are, of course, inadmissible. 

Here again there are possibly cases where the lowest 
rank is not accompanied by the lowest trace (i.e. the lowest 
sum of the communalities). But here again it seems cer- 
tain that if such cases do exist, they are mathematical 
curiosities which would never occur in practice. 

As an illustration the reader may use the example of 
Chapter II, Section 9 : 

5 

5883 
2852 
2852 
1480 



As we there saw, this matrix can be reduced to rank 2 
by the unique set of communalities- 1 

7 -7 -7 -13030 -5 

and we found there that, if we wanted to attain rank 2, 
we could not, for example, reduce the first communality 
to -5. 

We can, however, reduce the first communality to *5 if 

* See Chapter XV, Section 5, page 281. 
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we are willing to accept a higher rank than 2, that is, if 
we are willing to accept more common factors than two. 
But we find in that case that the remaining communalities 
necessarily rise so as to annul, and more than annul, the 
saving in communality achieved on the first test. We 
find ourselves bound to take the second communality more 
than the former ?, or inadmissible consequences ensue. 
We have a certain latitude in its choice, but there is a 
lower limit somewhere between -7 and '8 below which it 
makes the matrix inadmissible. Let us take -8 as the 
second communality (having thus still made a gross saving 
on the former communalities of -7 and -7) and calculate 
the remaining communalities, now fixed, which give rank 3. 
We can do this by the same process of pivotal condensation 
used in Chapter II, Section 9, making this time the matrix 
consist of nothing but zeros after three condensations (for 
rank 3) and then working back to the communalities. We 
find for the five communalities 

5 -8 -65474 -14592 -80786 

with a sum of 2-90852 for the total communality (or trace) 
compared with the total of 2-73030 with rank 2. Our 
attempt to save communality by reducing that of the 
first test from -7 to -5 and letting the rank rise has been 
foiled. The minimum rank carries with it, in all practically 
possible cases, the minimum communality and the maxi- 
mum specific variance. Minimizing the number of common 
factors maximizes the specific variance. 

3. Error specifics. That some of the variance of a test 
will probably be unique to that particular test given on 
that particular occasion is clear ; there will be an error 
specific. But not all errors in testing will produce unique 
or specific factors. The errors will include sheer blunders, 
such as mistakes in recording results ; sampling errors due 
to the particular set of persons tested ; and variable chance 
errors in the performances of the individuals. The first 
can with care be reduced to infinitesimal proportions. 
Sampling errors will be discussed in Chapter X, and we will 
only say here that they will in many or most cases produce 
not specific but common factors. The variable chance 
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errors in the performances of the individual may be unique 
to each test, but often they too will run through several 
tests, as when a candidate has a slight toothache, or is 
elated by good news, or disturbed by a street organ all 
of which things may affect several tests if they are adminis- 
tered on the same day. The " unreliability " of a test, 
due to variable chance errors, is caused by factors which 
are unique not to the test but to the occasion. Tests a and b 
performed to-day, and repeated as Tests a' and V to- 
morrow, may have reliabilities less than unity, yet the 
chance errors of to-day may link a and 6, and the chance 
errors of to-morrow may link a' and b'. Nevertheless 
some of the error variance will doubtless be unique, but 
surely nothing like the amount of specific variance due to 
the Thurstone principle of minimizing the number of com- 
mon factors can be due to this. 

There remains the true specific of each test. It does not 
seem unreasonable to suppose that such exist, though it is 
not easy to imagine them existing before the test is given. 
The ordinary idea of specific factors would be tricks 
learned by doing that particular test, as a motor-car or 
a rifle may have and usually does have idiosyncrasies 
unknown to the stranger. But it seems questionable 
whether a method of analysis is justifiable which makes 
specific factors play so large a part. 

4. Shorthand descriptions. It is to be observed that an 
analysis using the minimal number of common factors, and 
with maximized specific variance, is capable of reproducing 
the correlation coefficients exactly by means of these few 
common factors, and in the case of an artificial example 
will actually do so ; while in the case of an experimental 
example including errors, it will do so at least as well as 
any other method. If this is our sole purpose, therefore, 
the Thurstone type of analysis is best, since it uses fewest 
factors. 

But the few common factors of a Thurstone analysis do 
not enable us to reproduce the original test scores from 
which we began, they do not enable us to describe all the 
powers of our population of persons very well. With the 
same number of Hotelling's " principal components " as 
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Thurstone has of common factors we could arrive at a 
better description of the scores, though a worse one of the 
correlations. The reader may reply that he does not want 
factors for the purpose of reproducing either the original 
scores or the original correlations, for he possesses these 
already ! But what we really mean, and what it is very 
convenient to have, is a concise shorthand description, and 
the system we prefer will depend largely on our motives, 
whether we have a practical end in view or are urged by 
theoretical curiosity. The chief practical incentive is the 
hope that factors will somehow enable better vocational 
and educational predictions to be made. Mathematically, 
however, as we have seen, this is impossible. If the use of 
factors turns out to improve vocational advice it will be 
for some other reason than a mathematical one. For 
vocational or educational prediction means, mathemati- 
cally, projecting a point given by n oblique co-ordinate 
axes called tests on to a vector representing the occupation, 
whose angles with the tests are known, but which is not 
in the w-space of the tests. The use of factors merely 
means referring the point in question to a new set of co- 
ordinate axes called factors, a procedure which cannot 
define the point any better and, unless care is taken, may 
define it worse, nor does the change of axes in any way 
facilitate the projection on to the occupation vector. 
Moreover, the task of carrying out prediction with the aid 
of factors is rendered more difficult by the circumstance 
that the popular systems use more factors than there are 
tests, so that the factors themselves have to be estimated. 
In addition, it is usual to estimate only the common 
factors, throwing away the maximum amount of variance 
unique to each test, maximized by insisting on as few com- 
mon factors as possible. If there is any guarantee that 
these abandoned portions of the test variance are un- 
correlated with the occupation to be predicted, no harm is 
done. But the circumstances under which this guarantee 
can be given are precisely those circumstances under which 
a direct prediction without the intervention of factors 
can easily be made. 

5. Bartletfs estimates of common factors. Since, then, 
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the Thurstone system suffers, from a practical point of 
view, from this handicap of throwing away all information 
which can possibly be ascribed, rightly or wrongly, to 
specific factors, there is a peculiar interest in the proposal 
(M. S. Bartlett, 1935, 193Ta, 1938) to estimate the common 
factors, not by the regression method of the previous 
chapter, but by a method * which minimizes the sum of 
the squares of a man's specific factors (already maximized 
by the principle of using few common factors). 

The way in which Bartlett's estimates differ from 
regression estimates of factors can be very clearly seen by 
thinking in terms of the geometrical picture already used 
in earlier chapters (see Figures 14 to 20). When the 
factors outnumber the tests, the vectors representing the 
former are in a space of higher dimensions than the test 
space. 

The individual person is represented in the test space 
by a point, namely that point P whose projections on to 
the test vectors give his test scores. We do not know a 
representative point for this individual in the complete 
factor space, however. His representative point Q may 
be, for all we know, anywhere in the subspace which is 
perpendicular to the test space and intersects with it at 
P. In these circumstances the regression method takes 
refuge in the assumption that this individual is average 
in all qualities of which we know nothing ; that is, in all 
qualities orthogonal to our test space. It therefore 
assumes P to be his point also in the factor space, and 
projects P on to the factor axes to get the factor estimates 
for him. 

Bartlett's method is, in the present writer's opinion, 
equivalent to a different assumption about the position of 
the point Q. Within the complete factor space there is a 
subspace which contains the common factors. Of all the 
positions open to the point Q, Bartlett's method chooses 
that one which is nearest to the common-factor space, and 
from thence projects on to the common-factor vectors. 
This is equivalent to making the assumption that this man 
is not average in the qualities about which we know nothing, 
* See Appendix, paragraph 13. 
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but instead possesses in those unknown qualities just those 
degrees of excellence which bring his representative point 
to the chosen point Q. 

Both the regression method and Bartlett's method make 
assumptions about qualities which are quite unknown to 
us, and are quite uncorrelated with the tests we know. 
The regression assumption is that the man is average in 
these, Bartlett's assumption is that he is not average ; and 
because men are most frequently near the average, the 
regression assumption seems more likely to be correct. 
The other assumption can be justified only by its utility 
in attaining special ends ; it cannot be the most generally 
useful assumption. 

6. Their geometrical interpretation. All this can be most 
clearly seen (because a perspective diagram can be made) 
in the case of estimating one general factor g only, the 
hierarchical case. A figure like Figure 19 will illustrate 
this case, if we take y and z there to be two tests and x to 

be the g vector (see Figure 21). 
The man's representative 
point in the yz plane is P. 
But we do not know his re- 
presentative point Q in solid 
three-dimensional space, only 
that it is somewhere on the 
line P'PP". The regression 
method assumes that it is 
actually at P, the average, and 
projects P itself on to the g 
line to get the estimate OX of g. 
Bartlett's method, on the other 
hand, assumes that Q is at that 
point on P'PP" where it most 
nearly approaches the g line, that is, somewhere near the 
position P' in our diagram. Bartlett's estimate of g is 
then represented by OX'. 

Now, any point on the line p'pp" ? when projected on to 
the test vectors y and z, gives the same two test scores 
Y and Z. There is, in general, no point on the line g which 
does this exactly. But clearly X' 9 of all the points on g, 




Figure 21. 
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will be the point whose projections most nearly fall on 
Y and Z, for X 1 is as near as possible to the line P'PP". 
That is, the projection of X' on to the plane of the tests 
falls as near to the point P as is possible. In other words, 
if we ignore the specifics entirely and use only the estimated 
g in the specification of y and z, Bartlett's estimate comes 
as near as is possible to giving us back the full scores OY 
and OZ. If the regression estimate OX is projected on to 
the lines y and z, it will obviously give a worse approxima- 
tion much worse in our figure to OY and OZ. 

The regression method, in order to recover as much as 
possible of the original scores, would have to make a 
second estimate of them. For the estimates of g repre- 
sented by quantities like OX are not in standard measure. 
Before projecting the point X on to the lines y and z, 
therefore, to recover the original scores as far as possible, 
the regression method would alter the scale of its space 
along the g vector until the quantities like OX were in 
standard measure. This would not only change the posi- 
tion of X on the line, it would change the angles which 
the lines in the figure make with one another ; and would 
change them exactly in such a manner that, in the new space, 
the projection of OX on to y and z would fall exactly where 
the Bartlett projections from X' fall in the present space 
(Thomson, 1938a). 

There is, therefore, no final difference in excellence 
between the two methods in the matter of restoring the 
original scores as fully as possible, but the regression 
method takes two bites at the cherry. On the other hand, 
the regression estimates can be put straight into the speci- 
fication equation of an occupation which is known to 
require just these common factors, whereas here it is the 
Bartlett method which has to have a second shot. 

Both methods have to change their estimate of g when 
a new test is added to the battery. For the man is not 
very likely to have, in the specific of this new test, either 
the average value previously assumed by the regression 
method, or the special value assumed by the Bartlett 
method. But he is more likely to have the former than 
the latter, so the Bartlett estimates will change more 

F.A.S* 
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than do the regression estimates as the battery grows. 
Ultimately, when the number of tests becomes infinite, the 
two forms of estimate will agree. 

7. A numerical example. In the case of estimates of 
one general factor g from a hierarchical battery, the 
Bartlett estimates differ from the regression estimates only 
in scale. They put the candidates in the same order of 
merit for g as do the regression estimates, but give them a 
greater scatter, making the high g's higher and the low g's 
lower. The formula is 

1 r JV 2 ^ 

s i - v 

instead of Spearman's 

i-qhs r r^7^ (see page 106) ' 

With more than one common factor, the connexion 
between the two kinds of estimate is not so simple (Appen- 
dix, Section 13). The mathematical reader will be able to 
calculate the Bartlett factor estimates from the matrix 
formulae given in the Appendix. We shall here calculate 
them, for the example of Chapter VII, Section 5, from the 
regression estimates there given, and their matrix of 
variances and co variances given in Section 8 of that chapter. 

For if the matrix of regression loadings be represented 
by N, and the matrix of variances and co variances of the 
regression factors by K 9 then the matrix of Bartlett load- 
ings can be shown (Bartlett, 1938) to be 



This matrix multiplication can be carried out by Aitken's 
pivotal condensation also. For it has been shown (Aitken, 
1937a) that the pivotal condensation of a pattern of three 
matrices arranged thus : 

Y - Z 

X 

gives, when by repeated condensations all numbers have 
been removed from the left-hand block, the triple product 
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X Y" 1 Z. We shall therefore obtain the Bartlett loadings 
for estimating the factors from the tests if we condense 



K 



- N 



where / is the unit matrix which has unity in each cell of 
the principal diagonal and zeros elsewhere. The matrices 
K * and N are taken from pages 125 and 124, and the 
whole calculation is as follows (to three places of decimals 
only, to facilitate the arithmetic for readers who wish to 
check it) : 

Check 
Column 



674 -218 


127 


-300 


095 


532 


095 


003 




1-000 -323 


188 


445 


141 


-789 


-141 


-004 


(5) 


218 -567 
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-076 
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184 
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1-000 


-153 


-515 


1-107 


1-054 


370 


650 


(49) 


-076 
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064 
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306 


-729 


135 




-323 


-188 


445 


141 


789 


141 


1-005 




1-000 


. 


. 


. 


. 


. 


1-000 







1-000 





- 


- 





1-000 






520 
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082 


386 
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1-000 


-198 
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-1-348 
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1-129 


261 


1-215 
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232 


-180 


1-305 


-058 


1-299 








545 


1-083 


1-168 


-164 


297 


(6) 






198 


158 


742 


1-348 


646 





The Bartlett estimates of the factors, therefore, which 
* Slightly corrected to make it symmetrical. 
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we shall distinguish from the regression estimates by 
turning the circumflex accent upside down, are 

g = -232*! -180*2 + 1 -30523 -058* 4 

v = -545*! + 1 -083*2 1-168*3 '164* 4 

F = -198% -158*2 -742*3 + l-348* 4 

In Chapter VII, Section 5, we imagined a man whose 
scores in the four tests, in standard deviations, were 



*! * 2 

2 -4 



* 4 

6 



and calculated the regression estimates of his three factors, 
g, v 9 and F. By inserting his test scores in the above 
equations we can find, for comparison, the Bartlett esti- 
mates of his factors, shown in the following table : 

Factors 

Regression estimates 
Bartlett estimates 

This illustrates the tendency of the Bartlett estimates to 
be farther from the average than the regression estimates 
are. 





g 


V 


F 


tes 


451 
997 


-500 
1-240 


387 
392 



PART III 

THE INFLUENCE OF SAMPLING AND 
SELECTION OF THE PERSONS 



CHAPTER IX 

SAMPLING ERROR AND THE THEORY OF TWO 

FACTORS 

1. Sampling the population of persons. In the previous 
pages we have seldom mentioned sampling errors. There 
is an implicit reference to them in Chapter I, where a 
portion of an actual experimental matrix of correlations is 
shown as a contrast to the artificial ones used in the text ; 
and later in that chapter there is a closer approach to the 
difficulties caused by sampling errors. But apart from 
this, and perhaps one or two other references, the exposition 
in Parts I and II is entirely free from any consideration 
of them. The examples are made and worked as if on 
every occasion the whole population of people concerned 
had been accurately tested. 

The advantage of this is that it makes the theoretical 
principles stand out clearly, unobscured by the sampling 
difficulty. As a result, to mention one important point, 
it is thus made clear that the difficulties of estimating 
factors, described in Chapters VII and VIII, have nothing 
directly to do with sampling the population, but are due 
to having more factors than tests. It is true that an abso- 
lutely clean cut between an exposition which considers 
sampling errors, and one which disregards them, cannot 
be made. For sampling errors introduce error factors, and 
thereby swell the total number of factors. But even were 
the whole population of persons tested, factors which out- 
number the tests would remain " indeterminate," as it is 
sometimes expressed, meaning that they can only be 
estimated, not measured exactly. 

Another kind of sampling, however, does exist in Parts I 
and II, a sampling of the tests. We have there assumed 
that the whole population of persons is tested, but we 
have not supposed that they were plied with the whole 
population ' of tests. It is( difficult perhaps to say* what 

143 
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" the whole population of tests " means, but at any rate it is 
clear that in Parts I and II we were using only a few, not 
all possible tests. There is thus in our subject a double 
sampling problem, and this makes it very difficult. In the 
present section of this book (Part III) we shall consider the 
effects of sampling the population of persons. 

The general idea underlying the notion of a sampling 
error is not a difficult one. Take, for example, the average 
height of all living Englishmen who are of full age. This 
could, if need be, be ascertained by the process of measuring 
every living Englishman of full age. Actually this has 
never been done, and when anyone makes a statement 
such as " The average height of Englishmen is 67^ inches," 
he is basing it upon a sample only. This sample may 
not be an unbiased one. Indeed, samples of Englishmen 
whose height has been officially recorded are heavily loaded 
with certain classes of Englishmen for example, prisoners 
in gaol, and unemployed young men joining the army of 
preconscription days. The average height of such men 
may well differ from that of all Englishmen. But when we 
speak of sampling error, we do not mean error due to the 
sample being known to be a biased one. Even if the sample 
of Englishmen used to find the average height of their race 
were, as far as could be seen, a perfectly fair sample, 
containing the proper proportion of all classes of the 
community and of all adult ages, etc., it yet would not 
necessarily yield an average exactly equal to that of all 
Englishmen. Several apparent replicas of the sample 
would yield different averages. It is these differences, 
between statistics gathered from different but equally 
good samples, that we mean by sampling errors. 

It is worth while calling attention at this point to a 
general fact which will be found of importance at a later 
stage of this book. The true average height of Englishmen 
is only so by definition, and does not in principle differ 
from the average of a sample. We had to define the popu- 
lation we had in mind as " all living Englishmen of full 
age." This is a perfectly well-marked body of men. But 
it is itself in its turn only a sample : a sample of all living 
Europeans, or all living men. It is, indeed, altering daily 
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and hourly as men die or reach the age of 21, and each 
generation is a sample of those that have been and may be. 
Those who reach the age of 21 are only some, and therefore 
only a sample, of those born. And even those born are 
only a sample of those who might have been born had 
times been better or had there been no war, or a tax on 
bachelors. So the idea of sampling is a relative one, and 
the " complete population " from which we take samples 
is a matter of definition only. The mathematical problem 
in connexion with sampling which it is desirable to solve 
if possible for each statistic is to find the complete law of 
its distribution when it is derived from each of a large 
number of samples of a given size. Mathematically this 
is often very difficult, and frequently we have to be 
content with a formula which gives its approximate 
variance if certain assumptions are allowed and certain 
small quantities are neglected. 

Sampling problems are of two kinds, direct and inverse. 
The easier kind of problem is to say what the distribution 
of a statistic will be in samples of a given size when we 
know all about the true values in the whole population : 
the more difficult kind is to estimate what the true value 
of a statistic is in a complete population when we know 
its observed value in certain samples. They differ as 
do problems of interpolation and extrapolation. As an 
example of the direct kind of problem let us suppose that 
we actually knew the height of every adult Englishman 
of full age. We could then, on being told a certain sample 
of p Englishmen averaged such and such a height, calculate 
the probability that this sample was a random sample, a 
probability that would obviously grow less as the average 
of the sample departed from the average of the whole 
population. It would also depend on the size of the sample, 
for if a very large sample deviates far from the true average, 
it is less likely to be random, more likely to have some 
reason for the difference, than a small sample with the 
same average would have. 

2. The normal curve. By the distribution of a certain 
variable in the population we mean the curve (usually 
expressed as an equation) showing its frequency of dcctir- 
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rence for each possible value. Thus the curve in Figure 22 
might show the distribution of height in living adult 
Englishmen, by its height above the base line at each point. 
More men (represented by the line MN) have the average 
height, 67J inches, than have the height 73 inches, the 
frequency of the latter being shown by the line PQ. The 
shaded area represents all men whose height is 73 inches 
or more, and its ratio to the area under the whole curve 
is the probability that an Englishman taken absolutely at 
random will have a height of 73 inches or more. 

Very often distributions are, at any rate approximately, 
of a certain shape called the " normal curve." The normal 
curve has a known equation, it is symmetrical about its 
mid point, and with the aid of published tables can be 

drawn accurately (or 
reproduced arithmeti- 
cally) if we know the 
mid point M (which 
is the average of the 
measurements) and a 
certain distance ST or 
S'T (which is equal to 
the standard deviation 
of the measurements). 
S and S' are the points where the curve changes from 
being convex to being concave. 

If the distribution of a variable, say the heights of adult 
Englishmen, is " normal," then the distribution of the 
means of samples of p Englishmen's heights will also be 
normal, but will be more closely concentrated about the 
point M than are the measurements of individuals : in 
point of fact, its variance will be p times smaller, its 
standard deviation thus ^/p times smaller. That is to 
say, if we take sample after sample of 25 Englishmen 
each time, and for each sample record the average height, 
the means thus accumulated will be distributed in a curve 
of the same shape as that of Figure 22, but narrower from 
side to side, so that SS' would be one-fifth (V 25 ) of what 
it is in Figure 22, which is the distribution of single 
measurements; 
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61 62 63 64 65 66 67 68 6970 71 72 73 74 7576 
Figure 22. 
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If a sample were made with some special end in view, 
such as ascertaining whether red-headed men tend to be 
tall, we would decide whether we had detected such a 
tendency by calculating the probability that a mean such 
as our red-headed sample showed, or a mean still further 
away from M 9 would occur at random. For this purpose 
we would compare the deviation of our sample from M 
with the standard deviation of the distribution of such 
samples, obtained by dividing the standard deviation of 
individuals by the square root of p, the number in the 
sample. The ratio of the deviation found, to the standard 
deviation, is the criterion, and the larger it is the more 
likely is it that red -headed men really do tend to be tall. 
For most practical purposes we take a deviation of over 
three times the standard deviation as " significant." 

Sometimes the reader will find significance questions 
discussed in terms of the " probable error " instead of the 
standard deviation. The probable error is best considered 
as a conventional reduction of the standard deviation (or 
standard error, as it is sometimes called) to two-thirds of 
its value (more exactly, to '67449 of its value). 

Not only would the average height, or the average weight, 
of the sample of red -headed men differ from sample to 
sample. Statistics calculated in more complex ways from 
the measurements will also vary from sample to sample, 
as, for example, the variance of height, or the variance of 
weight, or the correlation of height and weight. Let us 
consider first the variance of the heights. In the whole 
population this is calculated by finding the mean, expres- 
sing every height as a plus or minus deviation from the 
mean, squaring all these deviations, and dividing the sum 
by the number in the population. 

This is also how we would find the variance of the sample 
if we really want the variance of the sample. But if we 
want an estimate of the variance in the whole population, 
and the sample is small, it is better to divide by one less 
than the number in the sample. A glimpse of the reason 
for this can be got by considering the case of the smallest 
possible sample, namely, one man. Here the mean of the 
sample is the one height that we have mfe'asur&d, and the 
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deviation of that measurement from the mean of the sample 
is zero. The formula if we divide by the number in the 
sample (one) will give zero for the variance and that is 
correct for the sample. But it would be too bold to estimate 
the variance of the whole population from one measurement : 
if we divide by one less than the sample we get variance 
= 0/0, that is, we don't know, which is a wiser statement.* 

More generally we can begin to understand the reason 
for dividing by (p 1) instead of by p by the following 
considerations. 

The quantity we want to estimate is the mean square 
deviation of the measurements of the whole population, 
the deviations being taken from the mean of that whole 
population. We do not, however, know that true mean, 
and therefore in a sample we are reduced to using the mean 
of the sample, which except by a miracle will not exactly 
coincide with the true or population mean. The conse- 
quence is that the sum of the squares we obtain is smaller 
than it would have been had we known and used the true 
mean. For it is a property of a mean that the sum of the 
squares of deviations from it is smaller than of deviations 
from any other point. 

Consider for example the numbers 2, 3 and 7. Their 
mean is 4, and the sum of the squares about 4 is 

( 2) 2 + ( I) 2 + 3 2 = 14 

* It is important to remember that sampling the population is not 
the only source of error in the measurement of statistics, e.g. the 
correlation coe fficient . All sorts of influences may disturb it . These 
will usually " attenuate " the correlation coefficient, i.e. tend to 
bring it nearer to zero, as can be seen when we consider that a perfect 
correlation only can be reduced by error. But they will not always 
do so, and if the errors in the two trait measurements are themselves 
correlated, they may even increase the true correlations in a majority 
of cases. An estimate of the amount of variable error present can 
be made from the correlation of two measurements of the same 
trait on the same group, a correlation called the " reliability," which 
should be perfect if no variable errors are present. Spearman's cor- 
rection for attenuation (see Brown and Thomson, 1925, 156) is based 
upon this. Like all estimates, the correction for attenuation is correct, 
even if the errors are uncorrelated, only on the average and not in 
each instance, and it should never be used unless it is small. If it 
is large, the experiments' are u unreliable " and shbulft be improved. 
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About any other point this sum will be greater than 14. 
About 5, for example, the sum is 

( 3) 2 + ( 2) 2 + 2 2 = 17 
About 2 the sum is 

O 2 + I 2 + 5 2 - 26 

It follows that the sum of the squares we obtained by 
using the sample mean was as small as possible, and in the 
immense majority of cases smaller than the sum about the 
true mean. It is to compensate for this that we divide 
by (p 1) instead of by p. 

These elementary considerations do not of course indi- 
cate just why this procedure should, in the long run, ex- 
actly compensate for using the sample mean. Why not 
(p 2), one might say, or (p 3) ? It is not possible, in 
an elementary account like the present, to answer this. 
Geometrical considerations, however, throw some further 
light on the problem. The p measurements of the sample 
may be thought of as existing in a certain space of (p 1) 
dimensions. For example, two points define a line (of one 
dimension), three points define a plane (of two dimensions) 
and so on. The true mean of the whole population is not 
likely to be within that space, whereas the mean of the 
sample is. The deviations we have actually squared and 
summed are therefore in a space of one dimension less than 
the space containing the true mean. One " degree of free- 
dom " has been lost by the fact that we have forced the 
lines we are squaring to exist in a space of (p 1) di- 
mensions instead of permitting them to project into a 
p - space. Hence the division by (p 1) instead of p. 

This principle goes further. For each statistic which we 
calculate from the sample itself and use in our subsequent 
calculations, we lose a " degree of freedom." 

The standard error of a variance u, if the parent popula- 
tion from which the samples are drawn is normally distri- 
buted, is estimated as 



where p is the number of persons in the sample. The 
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standard error of a correlation coefficient r is, with the 
same condition, estimated as 

1 r 2 



V(p - 1) 

The use of this standard error, however, should be dis- 
continued (unless the sample is large and r small). 

Fisher (1938, page 202) has pointed out that the use of the 
formula for the standard error of a correlation coefficient 
is valid only when the number in the sample is large and 
when the true value of the correlation does not approach 
1. For in small samples the distribution of r is not 
normal, and even in large samples it is far from normal 
for high correlations. The distribution of r for samples 
from a population where the correlation is zero differs 
markedly from that where the correlation is, say, 0-8. 
This means that the use of a standard error for testing 
the significance of correlation coefficients should, except 
under the above conditions, be discouraged. 

To get over the difficulty Fisher transforms r into a new 
variable z given by 



It is not, however, necessary to use this formula, as com- 
plete tables have been published for converting values 
of r into the corresponding values of z. As r goes from 1 
to + 1, z goes from oo to -f- o> and r = Q corresponds 
to z = 0. 

The great advantage of using z as a variable instead of r 
is that the form of the distribution of z depends very little 
upon the value of the correlation in the population from 
which samples are drawn. Though not strictly normal, it 
tends to normality rapidly as the size of the sample is 
increased, and even for small samples the assumption 
of normality is adequate for all practical purposes. The 
standard deviation of z may in all cases be taken to be 
l/Vp 3, where p is the number of persons in the sample. 

3. Error of a single tetrad-difference. For our discussion 
of the influence of sampling on the factorial analysis of 
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tests one of the most important quantities to know is the 
standard error of the tetrad-difference. There has been 
much debate concerning the proper formula for this. (See 
Spearman and Holzinger, 1924, 1925, 1929 ; Pearson and 
Moul, 1927 ; Wishart, 1928 ; Pearson, Jeffery, and Elder- 
ton, 1929 ; Spearman, 1931.) That generally employed is 
formula (16) in the Appendix to Spearman's The Abilities 
of Man : 

Standard error of r 13 r 24 ^ 2 3 r i4 
2 ni [Spearman and 

r 2 (l - r 12 - r 34 + r 1 ) + (1 - 2r 2 )* 2 Holzinger's 
L J formula (16).] 

where N is the number of persons in the sample,* 

r is the mean of the four correlation coefficients, and 
s 2 is their mean squared deviation (variance) from r. 

The probable error is -6745 times the above. A worked 
example will be found on page xii of Spearman's Appendix, 
using (which is all one can do) the observed values of the r's. 

It will be remembered that in Section 7 of Chapter I 
we stated Spearman's discovery in the form " tetrad- 
differences tend to be zero." If tetrad -differences in the 
whole population, however, were all actually zero, they 
would not remain exactly zero in samples, and it is only 
samples that are available to us. We are faced, therefore, 
with a twofold problem, (a) We have to decide, from the 
size of the tetrad-differences actually found in our sample, 
whether the sample is compatible with the theory that the 
tetrad-differences are zero in the whole population. But 
(b) we should also go on to consider whether the sample is 
equally compatible with the opposed hypothesis that the 
tetrad-differences are not zero in the whole population, 
leaving a verdict of "not proven." (See Emmett, 1936.) 

4. Distribution of a group of tetrad-differences. The 
actual calculation, for every separate tetrad-difference, of 
its standard error by Spearman and Holzinger's formula 
(16) is, however, an almost impossibly laborious task. In 

* We use p to mean the number of persons in this book, but are 
retaining N here and in " formula 16A " below to preserve the usual 
appearance of these well-known and much-used expressions. 
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a table of correlations formed from n tests there are 
n(n l)/2 correlation coefficients, and n(n l)(n 2) 
(n - 3)/8 different (though not independent) tetrad- 
differences. Any one particular correlation-coefficient is 
concerned in (n 2)(n 3) different tetrad-differences, 
and any one test in (n I)(n 2)(n 3)/2 different 
tetrad-differences. Thus with ten tests there are 630 
tetrad-differences, and with twenty tests 14,535 tetrad- 
differences. In the latter case, any one test is concerned 
in 2,907. Under these circumstances, it is natural to look 
for a more wholesale method than that of calculating the 
standard error of each tetrad-difference. The method 
adopted by Spearman is to form a table of the distribution 
of the tetrad-differences, and compare this distribution 
with that of a normal curve centred at zero and with 
standard deviation given by 

2 - * - - [Spearman and Hol- 



- rt* - - 
_ r) Z i ng er's formula (16 A ).] 

where N number of persons in the sample, 

r = the mean of all the r's in the whole table, 
s 2 = their mean squared deviation from r. 

n n 4 n 6 

R = 3r . -- 2r 2 . - and 
n 2 n 2' 

n = number of tests. 

Numerous examples of the comparison of " histograms " 
of tetrad-differences with normal curves whose standard 
deviation is found by (16A) are given in Spearman's The 
Abilities of Man. This method of establishing the hypo- 
thesis, that the tetrad-differences are derived by sampling 
from a population in which they are really zero, is open to 
the same doubt as was explained in the simpler case of 
one tetrad-difference. The comparison can prove that 
the tetrad-differences observed are compatible with that 
hypothesis. It does not in itself prove that they are 
compatible with that hypothesis only; and, as Emmett 
has shown in the article already mentioned, the odds are 
commonly rather against this. 
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The usual practice, moreover, is to " purify " the battery 
of tests until the actual distribution of tetrad-differences 
agrees with (16A), so that in effect all that is then proved 
is that a team can be arrived at which can be described in 
terms of two factors. This, although a more modest 
claim than has often been made, and certainly less than 
is implicitly understood by the average reader, is never- 
theless a matter of some importance. Not all teams of 
tests can be explained by one common factor ; but it is 
not very difficult to find teams which can. There is little 
doubt in the minds of most workers that a tendency towards 
hierarchical order actually exists among mental tests. 

5. Spearman's saturation formula. It will be remem- 
bered from Section 4 of Chapter I that the calculation of 
the g saturation of each test forms an important part of 
the Spearman process. We saw there that in a hierarchical 
matrix each correlation is the product of the two g satura- 
tions of the tests, for example 



Since this is so, each g saturation can be calculated 
from the correlations of a test with two others, and their 
inter-correlation. Thus to find r lg we can take Tests 2 and 
3 as reference tests, when we have 

7*12^3 _ r ig T 2 ff ^Jlg^Sff __ 2 

/ lg 

^*23 ^2g Tty 

When the matrix is really hierarchical, and there are 
no sampling errors present, it is immaterial which two tests 
we associate with Test 1 in order to find its g saturation. 
We have, in fact, in that case 

7*12 ^13 __ ?*14 ^15 ___ ^*12^Jjl5 ___. . 
?*23 ftt* ?*25 

But even if the correlations, measured in the whole 
population, were really exactly hierarchical, sampling 
errors would make these fractions differ somewhat . from 
one another, and we are faced with the problem of deciding 
which value to accept for the g saturation. The average 
of all possible fractions like the above would be one very 
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plausible quantity to take but is laborious to compute. 
Spearman therefore adopts a fraction 



_+ r u ^ 

HF" r 



etc. 



+ etc. 



whose numerator is the sum of the numerators, and whose 
denominator is the sum of the denominators, of the single 
fractions. This combined fraction he computes in a 
tabular manner which we will next describe, by the 
algebraically equivalent formula 



~T 2A l 



[Spearman's formula (21), 
Appendix, Abilities of Man.] 



The quantities A i9 A 29 etc., are the sums of the rows (or 
columns) of the matrix of correlations without any entries 
in the diagonal cells. (The arithmetical example is con- 
fined to five tests to economize space) : 





1 


2 


3 


4 


5 


A 


A* 


1 


m 


50 


34 


33 


24 


1-41 


1-988 


2 


50 


. 


56 


32 


15 


1-53 


2-341 


3 


34 


56 


. 


13 


35 


1-38 


1 -904 


4 


33 


32 


13 


. 


29 


1-07 


1-145 


5 


24 


15 


35 


29 





1-03 


1-061 



T = 6-42 



T is the sum of all the A's, and therefore of all the 
correlations in the table (where each occurs twice). A 
new table is now written out, with each coefficient squared, 
and its rows summed to obtain the quantities A' : 





1 


2 


3 


4 


5 


A' 


1 


% 


250 


116 


109 


058 


533 


2 


250 


. 


314 


102 


023 


689 


3 


116 


314 


. 


017 


123 


570 


4 


109 


102 


017 


. 


084 


312 


5 


058 


023 


123 


084 





288 



The calculation of all the saturations is then best per- 
formed in a tabular manner, thus : 
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533 
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4042 
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689 
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3-06 


3-36 


4917 


70 
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1-904 


570 


1-334 


2-76 


3-66 


3645 


60 


4 


1-145 


312 


833 


2-14 


4-28 


1946 


44 


5 


1-061 


288 


773 


2-06 


4-36 


1773 


42 



where the last column is the square root of the preceding. 
The reader should calculate the six different values of 
r lg from the original table by the formula (r tj . r ik /r jk )* 9 
for comparison with the value 66 obtained above. He 
will find 



55 



-72 
-93 



89 
48 
52 



with an average of -68. 

6. Residues. If the correlations which would arise from 
these saturations or loadings are calculated, and subtracted 
from the observed correlations, we obtain the residues 
which have then to be examined to see if they are small 
enough to be attributable to sampling error. In the 
following double table of correlations are set out the ob- 
served correlations uppermost, and those calculated from 
the g saturations below. The difference is the residue, 
which may be plus or minus : 



g Loadings -66 -70 



60 



44 



42 



66 
70 
60 
44 
42 
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35 


40 


-42 
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29 
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24 


15 


35 


29 
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28 


29 


25 


18 
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The lower numbers are the products of the two 
saturations. In this case the residues range from -14 
to + *14 and at first sight appear in many cases to be 
too large to be neglected in comparison with the original 
correlations. 

To check this impression, consider the correlation '56 
and the value '42 from which it is supposed to depart only 
by sampling error, a deviation of -14. Fisher's z corres- 
ponding to r *42 is -45, and that corresponding to r = 
56 is z = -63, so that the z deviation is -18. The standard 
deviation of z for 50 cases is 1 -f- V^7 = -15. The devia- 
tion is little larger than one standard deviation and cannot 
therefore be called significant. But as the reader will ob- 
serve, this conclusion is due more to the large size of the 
standard error than to the small size of the residue. The 
residue is here attributable to sampling error, because the 
latter is so large. But because the latter is large it does not 
follow that the large residue is certainly due to it. A test 
of the second kind is needed here (but is hardly ever ap- 
plied) to determine the odds for or against the alternative 
hypothesis, that the residue is not due to sampling error. 
The lack of tests of this second kind, as has already been 
emphasized in discussing tetrad-differences, is one of the 
most serious blemishes in the treatment of data during 
factorial analysis. If we are willing to allow 10 per cent, 
of the correlation coefficient as being a negligible quantity 
(a very generous concession), then the chance of our experi- 
mental value '56 having come by sampling from outside the 
area -42 i '042 is (with 50 cases in the sample) still quite 
considerable, about 5 to 1 for. These odds do not justify 
us in feeling confident that -56 does come from outside 
42 *042. But much less do they justify us in feeling 
that it comes from inside that region. 

7. Reference values for detecting specific correlation. If, 
after a calculation like that described, one of the residues 
is found to be too large to be explicable by sampling error, 
the excess of correlation over that due to g is attributed to 
" specific correlation," meaning correlation due to a part 
of their specific factors being not really unique but shared 
by these two tests. In the case of our numerical example, 
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if the number of subjects tested had been larger, the standard 
errors of the coefficients would have been smaller, and some 
of the discrepancies between the experimental values and 
those calculated from the g saturations would have been 
too large to be overlooked, but would have had to be 
attributed to specific correlation. In such a case, the g 
loadings would, of course, be wrong and would have to be 
recalculated from the battery after one of the tests con- 
cerned in the specific correlation was removed from it. 
Later, the other test could be replaced in the battery 
instead of the first, and thus its g saturation found. The 
difference between the experimental correlation of the 
two, and the product of their g saturations, with a 
standard error dependent on the size of the sample, would 
be then attributed to their specific linkage. 

If two tests, v and w, are thus suspected of having a 
specific link as well as that due to g, it is clear that the 
smallest battery of tests which could be used in the above 
manner to detect that link would be one of two other tests, 
x and y, say, to make up a tetrad : 

v x 



V ' r 



vy 



and these two " reference " tests would have to be known 
to have no specific links with each other or with the two 
suspected tests. The example which gave rise to Figure 5 
(see Chapter I, page 15) illustrates this. Tests 2 and 8 
there are, let us suppose, those with a suspected specific 
link. The tetrad-difference to be examined by means of 
Spearman's formula (16) is that which has r 23 as one corner. 
In such a case, where the two reference tests 1 and 4 are 
known to have no link except g with one another, or with 
the other two tests, two of the possible tetrad-differences 
ought to be larger than three times the standard error 
given by formula (16), and equal to one another, while the 
third tetrad-difference should be zero (or sufficiently near 
to zero, in practice) (Kelley, 1928, 67). 

The g saturation of each of the tests under examination 
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for specific correlation can be found by grouping it with 
the two reference tests. Thus in the case of our Figure 5, 
we have 

2 _ r !2 ?*24 __ '5 X '5 _ 

' 20 -- _____ -- - - - j 

r 14 -5 

2 - ^3 ^4 _ '5 X -5 

'So - ______ - - ^ 

- 



Therefore the correlation between 2 and 3 which is due 
to g is 

*V ^ = V' 5 x V' 5 = * 5 

and the difference between this and -8, the actual value, 
is the part to be explained by the specific factor shared by 
these two tests. The difference of *3 is not what is called 
the specific correlation itself, it should be remarked, but 
only its numerator. By specific correlation is meant the 
correlation between the two " specific " parts of the linked 
tests, due to these not being entirely unique, but having a 
part in common. How to calculate this we shall see after 
considering the effect of selection on correlation, in Chapter 
XI, end of Section 2 (page 175). 

When there are several reference tests available, all 
believed to have no link except g with one another or with 
the two tests suspected of specific overlap, there will be 
a number of ways of picking two of them to obtain the 
tetrad required to decide the matter, and the results will, 
because of sampling and other errors, be discrepant. Under 
these circumstances Spearman has devised an interesting 
procedure for amalgamating the results into one, which 
we can describe with the aid of the Pooling Square. Instead 
of using two single tests, let us in the first place imagine 
that the n tests available as reference tests are divided into 

(Mf\ 
- ) , and that the correlations 

of these pools with one another, and with the suspected 
tests, are used to form the tetrad. Following Spearman's 
notation in paragraph 9 of the Appendix to The Abilities 
of Man, we shall call the suspected tests v and w, and the 
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two pools the x pool and the y pool. We then want the 
tetrad of correlation coefficients : 





V 


x poo 1 


w 
y pool 


*. 

T vy 


TXU 

r *y 



of which r vw is known experimentally. The others we can 
find by using pooling squares. Take first r^. We have 

(writing three tests in each reference pool instead of - ) : 

2 
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^16 ^28 ^36 
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1 



and the correlation r xy of the two pools with one another 
is (Chapter VI, Section 2) 



Here the quantities r a , r 6 , and r c are the mean values of 
the correlation coefficients (excluding the units) to be found 
in the quadrants of the pooling square, thus : 



Now, there is clearly an arbitrary factor left in this 
procedure, inasmuch as the division of the n available tests 
into an x pool and a y pool can be made in many different 
ways, in each of which the mean values r a , r b9 and f c will be 
slightly different. To obviate this, Spearman takes the 
mean value f of all the n reference tests with one another 
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instead of each of these three means, upon which the 
formula for r xy simplifies to 



n . 
r 
2 



1 + 



G-o 



and it is this value which he uses in the tetrad. 

Similarly, the correlation of the test w with the x pool 
can be found by a pooling square : 



W 



w 


X l X* X s 


1 


r- wl r w , r a3 





J- ^*12 ^*13 
/*12 1 ^23 

r r- 23 1 



Its value is 



n . 



n 






Here, for the same reason as before, not only do we use 
the average inter-correlation of all the reference tests for r, 
but for f w we use the average of the correlations of the 
test w with all the reference tests and not merely with the 
x pool, for the x pool could be any half of them. 

Similarly the correlation r^ is found. Thus to form the 
tetrad all that we need do is to find : 

r, the average correlation of all the reference tests 

with one another ; 

r w , the average correlation of all the reference tests 

with w ; 
r v9 the average correlation of all the reference tests 

with v ; 

and substitute in the formulae. A numerical example is 
given by Spearman on page xxii of his Appendix. 



CHAPTER X 

MULTIPLE-FACTOR ANALYSIS WITH 
FALLIBLE DATA 

1. Method of approximating to the communalities. The 
influence of sampling errors on multiple-factor analysis is 
in general similar to that on the tetrad method. Sampling 
errors blur the picture. They make it both difficult for 
us to see the true outlines and easy to entertain hypotheses 
which cannot be disproved, though often they cannot be 
proved either, by the data. 

With artificial data like the examples used in Chapter II 
it may be laborious, but is not impossible, to find the actual 
rank of the matrix with various communalities, and thus 
to arrive by trial at the minimum rank. But when 
sampling errors are present, or any kind of errors, the 
question becomes at once immensely more difficult. We 
have seen in the previous chapter something of the diffi- 
culty of deciding from the size of the tetrad-differences 
when the rank of a matrix may justifiably be regarded as 
one. Such methods have not been used for higher ranks. 
The labour of calculating all three-rowed, four-rowed, or 
larger minors, setting out their distribution and comparing 
it with that to be anticipated from true zero values plus 
sampling error, is too great, and the mathematical difficulty 
not slight. What has been done is to judge of the rank 
by the inspection of the residues left after the removal of 
so-and-so many common factors, e.g. at the end of so-and- 
so many cycles of Thurstone's process, just as in Section 6 
of the preceding chapter we examined the residues left 
after one common factor was removed. But we must first 
show how Thurstone meets the difficulty of the unknown 
communalities. 

Thurstone has described many ways of estimating the 
communalities, and articles still issue from his laboratory 
on this subject (see in the Addendum on page 854 a brief 

F.A.-6 161 
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account of a paper by Medland). He points out, however, 
that if the number of tests is fairly large, an exact estimate 
is not very important, and can in any case be improved by 
iteration, using the sums of squares of the loadings for a 
new estimate. 

His practice is to use as an approximate communality 
the largest correlation coefficient in the column (Vectors, 
89). That this is plausible can be seen from a con- 
sideration of the case where there is only one factor, 
when the communality of Test 1 would be r 12 r 13 /r 23 , 
which is likely to be roughly equal to either r 12 or r 13 if 
these correlate highly with Test 1 and probably therefore 
with each other. 

We shall illustrate this approximate method of Thur- 
stone's on the same example as we used near the end of 
Chapter II, for the sake of comparison and for ease in 
arithmetical computation, even although that example is 
really an exact and artificial one unclouded by sampling 
error. Inserting then the highest coefficients in each 
column we get : 

(5883) -4 -4 -2 -5883 

4 (-7) -7 -3 -2852 

4 -7 (-7) -3 -2852 

-2 -3 -3 (-3) -1480 

5883 -2852 -2852 -1480 (-5883) 

2-1766 2-3852 2-3852 1-2480 1-8950 = 10-0900 

3-1765 2 
First 
Loadings ^-6852 -7509 -7509 -3929 -5966 

The communalities which really give the minimum rank 
are, as we saw in Section 9 of Chapter II 

7 -7 -7 -1303 -5 

and the correct first-factor loadings obtained by their use 
7257 -7564 -7564 -3420 -5729 

With a large battery the difference between the loadings 
obtained by the approximation and by the correct com- 
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munalities would be much less . For the * ' centroid ' ' method 
depends on the relative totals of the columns of the correla- 
tion matrix ; and when there are twenty or more tests, 
these relative totals will not be seriously changed by the 
exact value given to the communality in the column. 
When the number of tests is large, the influence of the one 
communality in each column is swamped by the influence 
of the numerous correlations. 

The process now goes on as in Chapter II, and the resid- 
uals left after subtraction of the first-factor matrix check 
by summing in each column to zero, as there. 

Before, however, proceeding any farther, in this approxi- 
mate method we delete the quantities in the diagonal (the 
residues of the guessed communalities) and replace them by 
the largest coefficient in the column regardless of its sign, 
which we change to plus in the diagonal cell if it is negative 
in its own cell. The reason for this is apparent, especially 
when, as may and does happen, the existing diagonal 
residues are negative, which is theoretically impossible. 
For although the guessing of the first communalities does 
not in a large battery make much difference to the first- 
factor loadings, it may make a big difference to the diagonal 
residues. If the battery is very large indeed, our first- 
factor loadings would come out much the same, even if we 
entered zero for every communality, but the diagonal 
residues would then all be negative. In short, the diagonal 
residues are much the least trustworthy part of the calcu- 
lation when approximate communalities are used, and it is 
better to delete them at each stage and make a new 
approximation. 

2. Illustrated on the Chapter II example. To make this 
clearer, the whole approximate process is here set out for 
our small example as far as the second residual matrix. 
The explanations printed alongside the calculation will 
make each stage clear. It is important to form the residual 
matrices exactly as instructed, as otherwise the check of 
the columns summing to zero will not work. In practice, 
certainly if a calculating machine were being used, several 
of the matrices here printed for clearness would be omitted ; 
for example, with a machine one would go straight from 
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A to C, while D and E would be made by actually altering 
C itself: 





(5883) 


4 


4 


2 


5883 




4 


(7) 


7 


3 


2852 


Largest r of 


4 


7 


(7) 


3 


2852 


column inserted 




2 


3 


3 


(3) 


1480 


in diagonal cell. 




5883 


2852 


2852 


1480 


(5883) 





Loadings 1 



2-1766 2-3852 2-3852 1-2480 1-8950 = 10-0900 

= 3-1765 2 
6852 -7509 -7509 -3929 -5966 = 3-1765 





6852 


(4695) 


5145 


5145 


2692 


4088 




B 


7509 
7509 


5145 
5145 


(5639) 
5639 


5639 
(-5639) 


2950 
2950 


4480 
4480 


First-factor 

TY-ir-f'||V i 




3929 


2692 


2950 


2950 


(-1544) 


2344 






5966 


4088 


4480 


4480 


2344 


(-3559) 








(1188) 


-1145 


-1145 


- -0692 


1795 








-1145 


(1361) 


1361 


0050 


- -1628 


First residual 


C 




-1145 


1361 


(1361) 


0050 


- -1628 


matrix. 






-0692 


0050 


0050 


(1456) 


- -0864 


A - B 






1795 


-1628 


- -1628 


- -0864 


(-2324) 






0001 


- -0001 


- -0001 


0000 


- -0001 


Columns check 














to zero. 






(1795) 


-1145 


-1145 


- -0692 


1795 


Largest r of each 






-1145 


(-1628) 


-1361 


0050 


- -1628 


column (regard- 


D 




-1145 


1361 


(-1628) 


0050 


- -1628 


less of sign) in- 






- -0692 


0050 


0050 


(-0864) 


-0864 


serted in each 






1795 


- -1628 


-1628 


-0864 


(1795) 


diagonal cell. 




6572 


5812 


5812 


2520 


7710 


Sum disregard- 














ing signs. 






(1795) 


1145 


1145 


0692 


1795 


Signs of Tests 2, 


I 


1145 


(1628) 


1361 


0050 


1628 


3, and 4 changed 


E 




1145 


1361 


(-1628) 


0050 


1628 


to make largest 






0692 


0050 


0050 


(0864) 


0864 


column (-7710) 






1795 


1628 


1628 


0864 


(1795) 


all positive. 



Algebraic 
Sum 



6572 



5812 -5812 



2520 



7710 = 2-8426 
! =1-6860 2 

Loadings Il\ -3898 -3447 -3447 -1495 -4573 (With temporary 
| signs.) 
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3898 


(1519) 


1344 


1344 


0583 


1783 






3447 


1344 


(1188) 


1188 


0515 


1576 


Second-factor 


F 


3447 


1344 


1188 


(1188) 


0515 


1576 


matrix, using 




1495 


0583 


0515 


0515 


(0124) 


0683 


temporary signs 




4573 


1783 


1576 


1576 


0683 ( 


2091) 








(-0276) 


- -0199 


- -0199 


0109 


0012 








-0199 


(0440) 


0173 


- -0465 


0052 


Second residual 


G 




-0199 


0173 


(0440) 


-0465 


0052 


matrix. 






0109 


- -0465 


- -0465 


(0640) 


0180 


E - F 






0012 


0052 


0052 


0180 (- 


0296) 






- -0001 


-0001 


0001 


-0001 


0000 


Columns check 














to zero. 



Notes. It is fortuitous that all the entries in E are positive. 
Usually some will be negative. 

In the check for the residual matrices, a discrepancy from zero 
in the last figure is often to be expected, even of three or four units 
in a large matrix. 

Note the negative value occurring in a diagonal cell in G. 

Further stages would be carried on in the same way. 
But at each stage the residues will be examined to see if 
further analysis is worth while, by methods indicated in 
Section 4 below. Meanwhile let us assume in the present 
example that no more factors need be extracted. 

The matrix of loadings of common factors thus arrived 
at is, after we have replaced the proper signs in Loadings II: 



Test 



6852 
7509 
7509 
3929 
5966 



Approximate Method 


True Values 


11 


Communality 


Communality 


3898 
- -3447 
- -3447 
- -1495 
4573 


6214 
6827 
6827 
1767 
5651 


7000 
7000 
7000 
1303 
5000 


2-7286 


2-7303 



The communalities *6214, etc., are the sums of the 
squares of the two loadings. For comparison with the 
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approximate communalities thus obtained there are shown 
the true values, which in this artificial case are known to 
us (see Chapter II, Section 9). This is for instructional 
purposes only the comparison is not intended as any 
criticism of Thurstone's method of approximation. As 
has been explained, this method is used only on large 
batteries, and it is a very severe test indeed to employ it 
on a battery of only five tests. 

We might now go back and begin our whole calculation 
again, using the communalities '6214, etc., arrived at by 
the first approximation. This does not seem often to be 
done in practice, most workers being content with the 
approximation first arrived at. If we repeat the calcula- 
tion again and again with our present example, on each 
occasion using as communalities the sum of the squares of 
the loadings given by the preceding calculation, we get the 
following sets of closer and closer approximation to the 
true communalities : * 





V 


kr? 


V 


V 


V 


mu- 


5883 


7000 


7000 


3000 


5883 


ion 


6214 


6827 


6827 


1767 


5651 


ion 


6381 


6970 


6970 


1477 


5392 


,ion 


6535 


7043 


7043 


1397 


5253 




7000 


7000 


7000 


1303 


5000 



First trial commu- 
nalities 

Next approximation 
Next approximation 
Next approximation 
True values 

The example has served to show how to work Thurstone's 
method of approximating to the communalities. It should 
be emphasized again that, being composed of only five tests, 
it is not a suitable example to employ in criticism of that 
method, and it is not so used here, but only as an illustra- 
tion. Being an artificial example, and not really overlaid 
with sampling error, it has had the advantage of allowing 
us to compare the approximations with the true values. 
But it must be remembered that a real experimental 

* It should be pointed out that iteration of each factor extraction 
separately will not give the same result as iteration of all. Iteration 
of the first factor will give the best approximation to rank one in the 
correlation matrix ; iteration of factor II the best approximation to 
rank one in the residues ; and so on. But this is not the same thing 
as approximating to the lowest rank of the whole matrix. 
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matrix is not likely to have an exact low rank to which 
approximation can converge as here. In that case the 
approximations will presumably give an indication of the 
low rank which the matrix nearly has, which it might be 
made to have by adjustments in its elements within the 
limits of their sampling errors. 

We might, indeed, have dealt with this method in 
Chapter II, quite unconnected with sampling errors, 
regarding it as a method of finding the communalities by 
successive approximations It has, however, been left to 
the present chapter because in actual practice it is asso- 
ciated with the difficulty of finding communalities because 
of sampling error, and also is not generally used as a 
repetitive process. The labour of repeating the whole 
calculation with new approximations to the communalities 
has been a deterrent, and the further fact that with large 
batteries the improvement produced is very small. Usually, 
therefore, the experimenter is content with the factor 
loadings first obtained. It is a great drawback of the 
method, especially in this form, that any mathematical 
expression of the standard errors of the resulting loadings 
is almost impossible, by reason of the chance nature of 
the approximations made at each stage (see McNemar, 
1941). On the other hand, the method does give load- 
ings which will imitate the experimental correlations to 
any desired degree of exactness, and does so with not very 
laborious arithmetic. 

3. Error specifics. We shall consider next the influence 
of sampling errors upon the specific factors of tests. We 
have hitherto used the term " specific " to mean all that 
part of a test ability which is unique to that test. There 
is a tendency, however, to confine the term specific factor 
to that non-communal part of the test ability which is not 
due to any kind of error, and to use " uniqueness " for 
both the true specific and the error specifics. 

Now it is not at all obvious that sampling errors in the 
correlation coefficients will produce only unique factors. 
Rather the contrary. In general, they will produce new 
common factors, for the sampling errors of correlation 
coefficients are themselves correlated. Pearson and Filon 
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gave the formulae for such correlation in 1898. The corre- 
lation coefficient of the sampling errors of r 12 and r 13 (where 
one of the tests occurs in each correlation) is roughly some- 
what less than r 23 , for positive correlations. The correla- 
tion coefficient of the sampling errors of r 12 and r 34 , on the 
other hand, is a much smaller quantity of the second order 
only. The result of this is (Thomson, 1919a, 406) that 
sampling errors tend to produce, not irregular ups and downs 
of the correlations, but a ridged effect, with a general 
upward, or a general downward, tendency. In other 
words, the error factors are, or include, common factors. 
Some of the unique variance of the tests may be due to 
sampling errors : but so will some of the communality of 
the tests. 

4. The number of common factors. As was indicated in 
Section 2 above, it is necessary to examine the residues left 
after each factor is removed, to see if it is worth while 
continuing. When the residues are so small that they may 
merely be due to random error (sampling or other error) 
it would seem to be futile to continue. There are certain 
snags here, however, connected with the skew sampling 
distribution of a correlation coefficient unless the true 
value is quite small, and with the fact mentioned in the 
preceding paragraph that sampling errors in correlation 
coefficients are themselves correlated with one another. 

The earliest method was to compare each residue with 
the standard error of the original correlation coefficient and 
cease factorizing when the residues all sank below twice 
these standard errors. But, as we have said on page 150, 
the use of the formula for the standard error of r is now 
frowned upon because of the skewness of the distribution. 

Moreover, sampling errors in the correlation coefficients, 
being themselves correlated, produce further factors ; and 
the above-mentioned test tended to stop the analysis too 
soon (Wilson and Worcester, 1939). These further factors 
must be taken out in order to give elbow room for rotation 
of the axes to some psychologically significant position. 
For the error factors are not concentrated in the last cen- 
troid or other factors taken out, but have been entangled 
with all the eentroids. Usually more factors have to be 
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taken out than can be expected, on rotation, to yield 
meaningful psychological factors, but all the dimensions 
are required nevertheless for the rotations. In geometrical 
terms, some of the dimensions of the common factor space 
will be due to sampling error, but not the particular di- 
mensions indicated by the directions of the last factors to 
be extracted. In terms of Hotelling's plan, the whole 
ellipsoid is distorted ; its small major axes are not neces- 
sarily due entirely to sampling, nor its large ones free from 
it. A %* method is described by Wilson and Worcester 
(1939, 139) which is, however, laborious when the number 
of tests is large. See also Burt (1940, 338-40). Lawley 
(1940, 76 et seq.) repeated Wilson and Worcester's criticism 
and developed an accurate criterion described in Chapter 
XXI. It is, however, only legitimate when the factor 
loadings have been found by Lawley's application of the 
method of maximum likelihood. 

There have been various suggestions, mainly empirical, 
for an easily applied criterion to decide when to stop 
factorizing. Thurstone (1938, 65 et seq.) discusses some 
of the earlier ones. 

Ledyard Tucker's criterion is that the ratio of the sums 
of the absolute values of the residuals, including the 
diagonal used, just after and just before the extraction of 
a factor must be less than (n l)/(n -f 1) where n is the 
number of tests. 

Coombs' criterion depends upon the number of negative 
signs left among the residuals after everything has been 
done to reduce them by sign-changing, in the centroid 
process. If they are few, another factor may be extracted. 
More exactly, the permissible number is given in this table : 

Number of tests . 10 15 20 25 30 

Negative signs . . 31 79 149 242 358 
Standard error . 5 7 10 12 15 

A fuller table is given in Coombs' article (1941). 

An example of the use of these two will be found in 
Blakey (1940, 126). 

Quinn McNemar (1942), who considers both of the 
above inadequate, gives a formula which includes 2V the 
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size of the sample. He takes out factors until cjj reaches 
or falls below l/\/2V, where 

<ii = a, -r (1 M h2 ), 

a, = st. dev. of the residuals after s factors, 
M h2 = mean communality for s factors. 

Others go on until the distribution of the residuals 
ceases to be significantly skew (Swineford, 1941, 378). 
Reyburn and Taylor (1939) divide the residuals by the 
probable errors of the original coefficients, and plot a 
distribution of the results disregarding sign. If it is 
significantly different from a normal curve of the same area 
and with standard deviation 1-4825, they take out more 
factors. Swineford (1941, 377) finds the correlation 
between the original correlations and the corresponding 
residuals and takes out factors till it is not significant. 
Hotelling's principal components lend themselves to more 
exact treatment. Hotelling himself (1933, 437-41) 
discusses the matter of the number which are significant. 
Davis (1945) shows how to find the reliability of each prin- 
cipal component from the reliabilities of the tests, and 
finds that it may happen that a later component is more 
reliable than an earlier one. 

The effect of sampling errors on factors and factorial 
analyses is indeed a very complex business, and it is advis- 
able to discuss how deliberate sampling of the population 
(whether human selection or natural selection) modifies 
analyses. 



CHAPTER XI 

THE INFLUENCE OF UNIVARIATE SELECTION 
ON FACTORIAL ANALYSIS* 

1. Univariate selection. All workers with intelligence 
tests know, or ought to know, that the correlations found 
between tests, or between tests and outside criteria, depend 
to a very great extent indeed upon the homogeneity or 
heterogeneity of the sample in which the correlations were 
measured. If, to take the usual illustration, we measure 
the correlation between height and weight in a sample of 
the population which includes babies, children, and grown- 
ups, we shall obviously get a very high result. If we 
confine our measurement to young people in their 'teens, 
we shall usually get a smaller value for the coefficient of 
correlation. If we make the group more homogeneous 
still, taking, say, only boys, and all of the same race and 
exactly the same age, the correlation of height and weight 
will be still less.*]* Through all these changes towards 
greater homogeneity in age, the standard deviation (or its 
square, the variance) of height has also been sinking, and 
the standard deviation of weight also. The formulae which 
describe these changes (in samples normally distributed, 
at any rate) were given in 1902 by Professor Karl Pearson, 
and when the selection of the persons forming the sample 
is made on the basis of one quality only, these formulae 
can be put into the following very simple form. 

Let the standard deviations of (say) four qualities be 
in the complete population we must, of course, in each 
case define what we mean by the complete population, as 
for example all living adults who were born in Scotland 
given by S 1? S 2 , S 3 , and S 4 , and their correlations by 

* Thomson, 1937 and 1938&. 

| Greater homogeneity need not necessarily, in the mathematical 
sense, decrease correlation, and occasionally it does not do so in 
actual psychological experiments. But it almost always does so. 

171 
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RU, RU> etc. Now let a selection of persons be made who 
are more homogeneous in the first quality say, in an 
intelligence test which has been given to them all so that 
its standard deviation in the sample is only c^, and write 



The smaller p is, the more homogeneous the group is in 
intelligence-test score. If we write 



q l will be larger, the greater the shrinkage in intelligence 
score-scatter from S x to a l9 We shall call q the " shrink- 
age " of the quality No. 1 in the sample. 

The other qualities 2, 3, and 4, being correlated with the 
first, will tend to shrink with it, and their expected shrink- 
ages q Z9 #3, and q^ can be calculated from the formula 

q i = qjt u 

For the sort of reason indicated earlier in this paragraph, 
the correlations of the four qualities which we are for 
simplicity in exposition assuming to be positively correlated 
in the whole population will also alter, according to the 
formula 



PiPj 

A numerical example will illuminate these formulae. Let 
us define our " whole population " as all the eleven -year- 
old children in Massachusetts, and let us suppose (the 
numbers are entirely fictitious) that the standard devia- 
tions of all their scores in four tests are : 

1. Stanford-Binet test 16-5 = 2 l5 

2. The X reading test 24-9 = S 2 , 

3. The Y arithmetic test 27-3 = S,, 

4. The Z drawing scale 14-2 = 2 4 , 

while the correlations between these four, in a State-wide 
survey, are (these are the R correlations) : 
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1 


2 


3 


4 


1 


. 


69 


75 


32 


2 


69 




54 


18 


3 


75 


54 


, 


06 


4 


32 


18 


06 


4 



Now let a sample of Massachusetts eleven -year-olds be 
taken who are less widely scattered in intelligence, with 
a standard deviation in their Stanford-Binet scores of 
only 10-2. How will all the other quantities listed above 
tend to alter in this sample ? We have, using the formulae 
quoted, the following 

Pl= =[^ = .618 

?1 = ^/(l -618 2 ) = -786 

and from q i = qiRu we have the other shrinkages q, and 
thence the coefficients p and the new standard deviations 



9 

P 



786 -542 
618 -840 
10-2 20-9 



590 -252 
808 -968 
22-1 13-7 



The formula for r^ then enables us at once to calculate 
the correlations to be expected in the sample, namely : 

i 1 2 34 



1 


. 


509 


574 


204 


2 


509 


. 


325 


054 


3 


574 


325 


. 


113 


4 


204 


054 


113 






The greater homogeneity in the sample has made all the 
correlation coefficients smaller, and has indeed made r 84 
become negative. 

2. Selection and partial correlation. If a sample is made 
completely homogeneous in the Stanford-Binet test, 
clearly pi = and qi = 1. The s&me formulae then give 
us : 
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1234 



P 

a 



I 





69 -75 -32 
524 -438 -904 
13-0 11-9 12-8 



and the resulting correlation coefficients, which in this case 
are called " coefficients of partial correlation for constant 
Stanford-Binet score," are, by the same formula : 

1234 



1 
2 
3 

4 



098 -086 
098 . -455 

-086 -455 



The correlations of the Stanford-Binet test with the 
others are given by the formula as 0/0, that is, indeter- 
minate. That they are really zero is seen from the fact 
that when p l is taken as not quite zero, but very small, 
these correlations come out by the formula as very small. 
They vanish with p t . 

In this special case of " partial correlation," where the 
directly selected test is so stringently selected that everyone 
in the sample has exactly the same score in it, our formula 



has a more familiar form. For since 

ft = 9iRu 
and q l = 1 

in this case of complete shrinkage we have 

ft = RU 

and Pi = V(l - #i* 2 ) 

so that our formula becomes 



the usual form of a partial correlation coefficient. Its 
more conventional notation is, calling the test which is 
made constant tefct k instead of Test 1 
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If the " test " which is held constant is the factor g, 
this becomes 



- V) \/(i - V) 

which is called the " specific correlation " between i andj. 
As we said in Section 7 of Chapter IX, its numerator is 
the " residue " left after removing the correlation due to g. 
If g is the sole cause of correlation, holding g constant will 
destroy the correlation and we shall have 

r v = V* 

as we already saw from another point of view was the case 
in a hierarchical battery, in Section 4 of Chapter I. 
3. Effect on communalities. The formula 



is thus a very useful formula, including partial correlation 
as a special case. If the original variances are each taken 
as unity, the numerator R% qty for i =*= j gives the new 
covariances, while pf and pf are the new variances. 

It also includes as a special case the formula known as 
the Otis-Kelley formula, which is applicable when two 
variates have both shrunk to the same extent (a restriction 
not always recognized). If we put q { = # ; and therefore 
p i = p. it becomes 

* 



= ij - <f = <j - P* 
p*(I - r<,) = 1 - Rq 

1 "~ R v = p* = -^ = ~t the Otis-Kelley formula. 

2 - 2 



It has a still further application (Thomson, 19386, 456), 
for if a matrix of correlations in the wider population has 
been analysed by Thurstone's process, this same formula 
gives the new communalities (with one exception) to be 
expected in the sample, if we put i = j and understand by 
R ii9 the communality in the wider population, by r u , the 
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communality in the sample (and not a reliability coefficient, 
which is the usual meaning of this symbol). Writing the 
usual symbol h* for communality we have the formula in 
the form 



Pi* 



= 3j 



The exception is the new communality of the trait or 
quality which has been directly selected, in our example 
No. 1 the Stanford -Binet scores. For the directly selected 
trait the new communality is given by 



(Thomson, 19386, 455 ; and see also Ledermami, 19386). 
With these formulae we can see what is likely to happen 
to a whole factorial analysis when the persons who are the 
subjects of the tests are only a sample of the wider popula- 
tion in which the analysis was first made. 

4. Hierarchical numerical example. We shall take, in 
the first place, the perfectly hierarchical example of our 
Chapters I and II. But to save space in the tables we 
shall consider only the first four tests. Their matrix of 
correlations, with the one common factor and the four 
specifics added, and with communalities inserted in the 
diagonal cells, was as follows : 





1 


2 


3 


4 


8 *1 *2 


% 


1 


(81) 


72 


63 


54 


90 -44 


. 


2 


72 


(-64) 


56 


48 


80 . -60 


. 


3 


63 


56 


(-49) 


42 


70 


71 


4 


54 


48 


42 


(36) 


60 




g 


90 


80 


70 


60 


1-00 


. 


s l 


44 


. 


. 


. 


1-00 




5 2 


. 


60 


. 


. 


1-00 


. 


*3 


t 


f 


71 


. 


. * 


1-00 


s 


. 


t 


. 


80 


. 


. 



80 



1-00 



The bottom right-hand quadrant shows, by its zero 
entries, that the factors are all uncorrelated with one 
another, that is, orthogonal. The ttests expressed as linear 
functions of thfc factoid are 
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2 2 = -8g + -600$ 2 
*s = -7g + -714*3 
S 4 = -6g + -800,94 

These equations are only another way of expressing the 
same facts as are shown in the north-east, or the south- 
west, quadrant of the matrix (where only two places of 
decimals are used for the specific loadings, to keep the 
printing regular). 

Let us now suppose that this matrix and these equations 
refer to a wide and defined population, e.g. all Massa- 
chusetts eleven-year-olds, and let us ask what will be the 
most likely matrix of correlations between these tests and 
factors to be found in a sample chosen by their scores in 
Test 1 so as to be more homogeneous. The variance of 
Test 1 in the wider population being taken as unity, let 
us take that in the more homogeneous select sample as 
being p^ = -36. We then have, using q i = qiRu, and 
treating g and the specifics just like tests, the following 
table : 
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2 


3 


4 


g 


*i 


*2 


*3 *4 


<7 


80 


576 


504 


432 


720 


349 


. 


f . 


P 


60 


817 


864 


902 


694 


937 


1 


1 1 


p 2 (variance) 


36 


668 


746 


813 


482 


878 


1 


1 1 



For the correlations and communalities, using our 
formula 



we get (again printing only two decimal places) : 
1 2 3 4 ! s, s* s* 
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(61) 


53 


44 


36 


78 


28 


t 


2 


53 


(46) 


38 


31 


68 


-26 


73 


3 


44 


38 


(32) 


26 


56 


-22 


. 


4 


36 


31 


26 


(21) 


46 


-18 





g 


78 


68 


56 


46 


1-00 


-39 


, 


s i 


28 - 


-26 


22 


-18 


39 


1-00 


. 


S 2 


. 


73 


. 


. 


. 




1-00 


S j? 







83 


-89 











83 



89 



1-00 



1-00 
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In the more homogeneous sample, therefore, the 
correlations and the communalities of all the tests have 
sunk. The g column shows what the new correlations of g 
are with the tests ; and on examination of the matrix we 
see that these, when cross-multiplied with one another, 
still give the rest of the matrix. Thus 

78 X -46 = -36 (r 14 ) 
.68 2 -= -46 (A 2 2 ) 

The test matrix is still of rank 1 (Thomson, 19386, 453), 
and these g-column entries can become the diminished 
loadings of the single common factor required by Rank 1. 

The columns for the specifics s 2 , s 3 (and later specifics 
also) still show only one entry. In the bottom right-hand 
quadrant, zero entries show that these specifics are still 
uncorrelated with one another and with g, that is, g, s Z9 s 39 
and $ 4 are still orthogonal. 

But something has happened to the specific $ t . It has 
become correlated with g, and with all the tests. It has 
become an oblique factor, orthogonal still to the other 
specifics, but inclined to g and the tests. It leans further 
away from Test 1 than it formerly did, and makes obtuse 
angles (negative correlation) with the other tests and with g, 
to which it was originally orthogonal. 

But since, as we have already pointed out, the test matrix 
with the reduced communalities is still of rank 1, it is 
clear that a fresh analysis could be made of the tests into 
one common factor and specifics, thus 

a/ == -778g' + -628V 
* 2 ' = -679g' + -734s 2 
+ -827*3 



In these equations the factors g' 9 $i' 9 S 29 $3, and $ 4 are 
again orthogonal (uncorrelated), and the loadings shown 
give the correlations and give unit variances. This is the 
analysis which an experimenter would make who began 
with the sample and knew nothing about any test measure- 
ments in the whole population, 

The reader, comparing the loadings in these equations 
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with the correlations in the matrix of the sample, will 
rightly conclude that the specifics from s a onward have not 
changed. In the matrix it is clear that they are still 
orthogonal, and their correlations with the tests, in the 
matrix, are the same as their loadings in the equations. 
The tests are, in the sample, more heavily loaded with these 
specifics than they were in the population, but the specifics 
are the same in themselves. 

The new specific */ the reader will readily agree to be 
different from $i* The latter became oblique in the sample, 
whereas Si' is orthogonal. What now is to be said about 
the common factors g (in the population) and g r (in the 
sample) ? From the fact that the loadings of g\ in the 
sample equations, are identical with the correlations of the 
original g with the tests, in the sample matrix, one is 
tempted to imagine g' and g to be identical in nature. But 
that is not so certain. 

If we go back to the equations of the tests in the popu- 
lation, we can rewrite them in the following form 

Zl = -467' + -800" + -877*! ' 
Z 2 = -555g' + -576g" + -600s 2 
2 3 = -485g' + -504g" + -714*3 

z, = - 



with two common factors g' and g" instead of one common 
factor g. These equations still give the same correlations. 
Por example 

r u = -467 X -417 + -800 X -432 = -540 as before. 

In these equations the specifics $ a s 3 , $ 4 are the same, and 
the communalities of Tests 2, 3, and 4 are the same. All 
that we have done in these three tests is to divide the 
common factor g into two components. The ratio of the 
loading of g" to the loading of g' is the same in each of 
them. The loadings of g* we have made identical with the 
shrinkages q in the table on page 177. 

In Test 1 also we have made the loading of g" equal to 
the shrinkage q t = '8. But in this test g* cannot be looked 
upon merely as a component of g. To give the correct 
s, the loading of g' has to be *467 as shown, atod 
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the communality of Test 1 has been raised from its former 
value (-81) to 

467 2 + -800 2 = -858 

while the loading of the specific has correspondingly sunk. 
The factors g', g" 9 and Si are a totally new analysis of 
Test 1 in the population. Part of the former specific has 
been incorporated in the common factors. 

Now let the factor g" be abolished, i.e. held constant, so 
that the tests (now of less than unit variance, so we write 
them with x instead of z) are 

Variances 

Xi = -467g' + -877$!' -360 

-668 
-746 
XL = -417g' + -800s 4 -813 

The reduced variances are the sum of the squares of the 
surviving loadings, e.g. 

467 2 + '377* = -360 

The variances, it will be seen, are the p* 's of our tests 
as measured in the sample. If each of the last set of 
equations is divided through by the square root of its 
variance, we arrive at the equations 



-827*3 



which is the analysis already given as that of an experi- 
menter who knew only the sample. As to the nature of g' 9 
we can say in Tests 2, 3, and 4 that it is possible to regard 
it as a component of the g of the population. But we 
cannot do so with assurance in Test 1. There its nature is 
more dubious. At all events, it is not the same common 
factor as in the population, and at best we can say that it 
is one of its components. 

5. A sample all alike in Test 1. These phenomena are 
still more striking if we consider a case where the sample 
is composed pf persons who are all alike in Test 1. It 
wb\ild be an Excellent dxe&isfe foV the re'adfe? to, ' 
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the resulting matrix of correlations for tests and population 
factors in this case. The tests act in this case as though 
their original equations in the population had been 

i = g* 



= -305g' + -630g" + -714*3 
+ -540g" + -800*4 



and then g" had become zero, i.e. a constant with no 
variance. 

It perhaps helps to a further understanding of what is 
happening to the factors during selection if we realize that 
holding the score of Test 1 constant does not hold its factors 
g and Si constant. They can vary in the sample from 
man to man, but since 

*i = *g + '486*! 

remains constant, a man in the sample who has a high g 
must have a low $j that is, these factors are negatively 
correlated in the sample. And because they are thus 
negatively correlated, those members of the sample who 
have high g's, and who will therefore tend to do well in 
Tests 2, 3, and 4, will tend to have values below average 
(negative values) for their s l9 which will be therefore 
negatively correlated with these tests, in this sample. 

So far in our examples we have assumed the sample to 
be more homogeneous than the population. But a sample 
can be selected to be less homogeneous. In such a case 
the same formulae will serve, if we simply make the capital 
letters refer to the sample and the small to the population. 
In fact, the same tables, with their rdles reversed, can 
illustrate this case. In practical life we usually know which 
of two groups we would call the sample, and which the 
population. But mathematically there is no distinction, 
the one is a distortion of the other, and which is the " true " 
state of affairs is a question without meaning. 

It must also throughout be remembered that all these 
formulae and statements refer, not to consequences which 
are certain to follow, but to consequences which are to be 
expected. If actual samples were made the values experi- 
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mentally found in them for correlations, communalities, 
loadings, etc., would oscillate about those given by our 
formula 1 , violently in the case of small samples, only 
slightly in the case of large samples. 

6. An example of rank 2. The above example has only 
one common factor. We turn next to consider an example 
with two. Again it is, we suppose, the first test according 
to which the sample is deliberately selected, and again 
we suppose the " shrinkage " q l to be -8. The matrices 
of correlations and communalities, in the population and 
in the sample, are then as follows, the two factors/! and/ 2 
and the specifics being treated in the calculation exactly 
as if they were tests. To economize room on the page, 
we omit the later specifics : 

Correlations in the Population 





1 


2 


3 


4 


5 


/i 


/2 


1 i 


1 


(65) 


46 


59 


36 


41 


70 


40 


59 


2 


46 


(37) 


36 


26 


23 


60 


10 


79 


3 


59 


36 


(61) 


32 


45 


50 


60 


. . 


4 


36 


26 


32 


(20) 


22 


40 


20 


. 


5 


41 


23 


45 


22 


(34) 


30 


50 





/i 


70 


60 


50 


40 


30 


(1-00) 


9 


, f 


/2 


40 


10 


60 


20 


50 


. 


(1-00) 


. 


$1 


59 


. 




. 


. 


. 


. 


(1-00) . 


S 2 


. 


79 


. 




. 


. 


. 


. (1-00) 



Correlations in the Sample 





1 


2 


3 


4 


5 


/i 


/2 


*i 


*2 


1 


(-40) 


30 


40 


23 


26 


51 


25 


40 




2 


30 


(27) 


23 


17 


12 


51 


-02 


-21 


85 


3 


40 


23 


50 


22 


35 


32 


54 


-29 


. 


4 


23 


17 


22 


(-13) 


14 


30 


12 


-16 


. 


5 


26 


12 


35 


14 


(26) 


15 


44 


19 





/i 


51 


51 


32 


30 


15 


(1-00) 


-23 


-36 




/ 


25 


-02 


54 


12 


44 


-23 


(1-00) 


-18 


. 


*i 


40 


21 


-29 


-16 


-19 


-36 


-18 


(1-00) 


. 


*i 


. 


85 


. 


. 


. 


. 


. 


. 


(1-00) 



We see here a new phenomenon. The two common 
factors fi and / 2 in the population were orthogonal to one 
another, as is shown by the zero correlation between them. 
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But in the sample they are negatively correlated (- -228) ; 
that is, they are oblique. We begin to see a generalization 
which can be algebraically proved, that all the factors, 
common and specific, which are concerned with the directly 
selected test(s) become oblique to each other and to all the tests, 
but the specifics of the indirectly selected tests remain orthogonal 
to everything, except each to its own test. 

But the matrix of the tests themselves is still of rank 2, 
and an experimenter working only with the sample would 
find this out, although he would know nothing about the 
population matrix. He would therefore set to work to 
analyse it into two common factors, orthogonal to one 
another. A Thurstone analysis comes out in two common 
factors exactly, and can be rotated until all the loadings 
are positive. For example : 



Test 



Factor// 
Factor / 2 ' 



12345 

570 -521 -436 -332 -238 
276 . -555 -130 -452 



These factors /', however, are clearly a different pair 
from the factors / in the original population. In the 
sample, those original factors (/) are oblique ; these (/') 
are orthogonal. 

Again the whole phenomenon is reversible. The second 
matrix (with the orthogonal factors/') might refer to the 
population, and a sample picked with a suitable increased 
scatter of Variate 1. All our formulae could be worked 
backwards, and we should arrive at the matrix beginning 
(65), referring now to the sample. The/' factors would 
have become oblique, and a new analysis, suitably rotated, 
would give us the other factors/. 

It becomes evident that the factors we obtain by the 
analysis of tests depend upon the subpopulation we have 
tested. They are not realities in any physical sense of the 
word ; they vary and change as we pass from one body of 
men to another. It is possible, and this is a hope hinted 
at in Thurstone 5 s book The Vectors of Mind, that if we 
could somehow identify a set of factors throughout all 
their changes from sample to sample (in most of which 
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they would be oblique) as being in some way unique, we 
might arrive at factors having some measure of reality 
and fixity. Thurstone, in his latest book Multiple Factor 
Analysis, believes that he has achieved this, and that his 
Simple Structure is invariant. His claim is considered 
below in Section 9 of our Chapter XVIII on Oblique 
Factors. 

7. A simple geometrical picture of selection. The 
geometrical picture of correlation between tests and factors 

which was described in Chapter 
IV is of some help in seeing 
exactly what happens to factors 
under selection in some test or 
trait. In Figure 23, x^ repre- 
sents the vector of the test or 
trait which is to be directly 
selected for, and g and ^ are 
the axes of the common factor 
and of its specific taking the 
case of one common factor 
only. The circle indicates the 
circular nature of the crowd of 
points which represent the 
population. It is a line of 
equal density of that crowd, 
which is densest at the origin 
and thins off equally in all 
directions. One-quarter of that 




Figure 23. 




Figure 24. 



crowd are above average in both g and s l9 and another 
quarter are below average in both. The correlation 
between g and s is zero. 

But in the selected sample (Figure 23) the scatter of the 
persons along the test vector x^ has been reduced. Persons 
have been removed from the whole crowd to leave the 
sample, but they have not been removed equally over the 
whole crowd. The line of equal density has become an 
ellipse, which is shorter along the line of the test vector x l 
than at right angles to that line. If we now compare the 
figure with Figure 18 in Chapter V (page 67), we see that 
it represents a state of negative correlation between g and $ x . 
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Less than one-quarter are now above average in both 
g and SD less than one-quarter below average in both. A 
majority are cases of being above average in the one factor 
and below in the other. 

An experimenter coming first to the sample and knowing 
nothing about the population will naturally standardize 
each of his tests. He can do, indeed, nothing else. That 
is to say, he treats the crowd as again symmetrical and 
our ellipse as a circle (Figure 24). In his space, therefore, 
the lines g and Si will be at an obtuse angle, just as the 
axes in Section 2 of Chapter V became acute. He knows 
nothing about these lines, but chooses new axes for himself. 
If these are at right angles, one of them may be one of the 
old axes, but they cannot both coincide with the old axes. 

8. Random selection. These considerations, in Sections 
1-7, deal with the results to be expected when a sample 
is deliberately selected so that the variance of one test is 
changed to some desired extent. The new variances and 
the changed correlations of the other tests given by our 
formula 

_K- 



are n ot the certain result of our action in selecting for Test 1 . 
If we selected a large number of samples of the same size, 
all with the same reduced variance in Test 1, they would 
not all be alike in the resulting correlations. On the con- 
trary, they would all be different. But most of them would 
be like the expected set, few would depart widely from that ; 
and the departures would be in both directions, some 
samples lying on the one side, others on the other side, 
of our expectation. 

If now, instead of selecting samples which are all alike 
in the variance of one nominated test, we take a large 
number of random samples of the same size, what would we 
find ? Among them would be a number which were alike 
in the variance of Test 1, and these in the other part of 
the correlation matrix would have values which varied 
round about those given by our formula. We could also 
pick out, instead of a set all alike in the variance of Test 1 ? 
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a different set all alike in the variance of Test 4, say ; 
and these would have values in the remainder of the matrix 
oscillating about our formula, in which Test 4 would replace 
Test 1. In short, a complex family of random samples 
would show a structure among themselves such that if we 
fix any one variance the average of that array of samples 
obeys our formula.* Random sampling will not merely 
add an " error specific " to existing factors, it will make 
complex changes in the common factors. 

9. Oblique factors. This chapter, and the next, and the 
original articles on which they are based, were written be- 
fore Thurstone's work with oblique factors had been pub- 
lished. In those days it was assumed that analyses were 
into orthogonal or uncorrelated factors, and that only such 
factors could be looked upon as separate entities with any 
degree of reality inherent in them. When oblique factors 
are permitted, however, and methods for determining them 
devised (as is now the case in 1947), it becomes conceivable 
that the same factors might persist despite selection, and 
be discoverable, though their correlations with each other 
would change. Thurstone to-day urges that this is indeed 
so, and his arguments to this effect are discussed at the 
end of our Chapter XVIII, after his Oblique Simple Struc- 
ture has been explained. 

* On the author's suggestion, Dr. W. Ledermann has since 
proved this conjecture analytically (Biometrika, 1939, XXX, 295- 
304) His results cover also the case of multivariate selection (see 
next chapter). 



CHAPTER XII 

THE INFLUENCE OF MULTIVARIATE 
SELECTION * 

1. Altering two variances and the covariance. In the pre- 
ceding chapter we have discussed the changes which occur 
in the variances and correlations of a set of tests, and in 
their factors, when the sample of persons tested is chosen 
according to their performance in one of the tests : we 
are next going to see the results of picking our sample by 
their performances in more than one of the tests, first of 
all in two of them. Take again, the perfectly hierarchical 
example of the last chapter and of Chapters I and II. We 
must this time go as far as six tests in order to see all the 
consequences. The matrix of correlations of these tests 
and their factors will be simply an extension of that 
printed on page 176. 

Now let us imagine a sample picked so that the variance 
of Test 1 and also that of Test 2 is intentionally altered, 
and further, their covariance (and hence their correlation) 
changed to some predetermined value. 

It is at once clear that in these two directly selected 
tests the factorial composition will in general be changed 
can indeed be changed to anything which is not incom- 
patible with common sense and the laws of logic. What, 
however, will be the resulting sympathetic changes in the 
variances and co variances of the other tests of the battery ? 

In Chapter XI we altered the variance of Test 1 from 

unity to -36. The consequent diminution in variance to be 

expected in Test 2 was, as is shown on page 177, from 

unity to '668, and the consequent change in correlation 

from *72 to -53. Here, however, let us pick our sample so 

that the variance of the second test is also diminished to 

36, and so that the correlation between them, instead of 

falling, rises to -833. We have, that is to say, chosen 

* Thomson, 1937 ; Thomson and Ledermann, 1988. 
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people for our sample who tend to be rather more alike 
than usual in these two test scores, as well as being closely 
grouped in each, an unusual but not an inconceivable 
sample. Natural selection (which includes selection by the 
other sex in mating) has no doubt often preferred indi- 
viduals in whom two organs tended to go together, as 
long legs with long arms, and the same sort of thing might 
occur in mental traits. In terms of variance and covarian ce 
we have changed the matrix : 



to the matrix : 



1 
2 



1 
2 



1-00 -72 
72 1-00 



36 
30 



30 
36 



pp 



for ---- 



30 



- = = -833, the new correlation. Notice 
V(-36 X -36) 6 

that the diagonal entries here (unities in R pp and -36, -36 
in V pp ) are the variances, not the communalities. 

2. Aiiken's multivariate selection formula. We shall 
symbolically represent the whole original matrix of vari- 
ances and covariances by : 



R 



Rn, 



where the subscript p refers to the directly selected or 
picked tests, and the subscript q to all the other tests and 
the factors. R pq (and also R qp ) means the matrix of Co- 
variances of the picked tests with all the others, including 
the factors. R qq means the matrix of variances and Co- 
variances of the latter among themselves. Since at the 
outset the tests and factors are all assumed to be stan- 
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dardized, the variances in this whole R matrix are all 
unity, and the co variances are simply coefficients of 
correlation. In our case the R matrix is : 

Analysis in the Population 





1 


2 


3 


4 


5 


6 


g *1 *2 *3 


s< s, s 6 


1 


1-00 


72 


63 


54 


45 


36 


90 -44 . 




2 


72 


1-00 


56 


48 


40 


32 


80 . -60 . 


. 


3 


63 


56 


1*00 


42 


35 


28 


70 . . -71 





4 


54 


48 


42 


1-00 


30 


24 


60 ... 


80 . 


5 


45 


40 


35 


30 


1-00 


20 


50 ... 


. -87 . 


6 


36 


32 


28 


24 


20 


1-00 


40 ... 


. -92 


g 


90 


80 


70 


60 


50 


40 


1-00 . 


. 


o 


44 












. 1-00 . 




*i 




60 


. 


. 


. 


. 


. LOO . 


... 


*3 


. 


. 


71 


. 


. 




. 1-00 


... 


S 4 


. 


. 




80 




. 




1-00 . 


$ 5 


. 


. 




. 


87 






.1-00 . 


5 


. 


. 


. 


. 




92 


.... 


. 1-00 



The R pp matrix is the square 2x2 matrix, the R qq matrix 
the square 11 X 11 matrix, while R pq has two rows and 
eleven columns, R qp being the same transposed. 

Our object is to find what may be expected to happen 
to the rest of the matrix when R pp is changed to V pp . 
Formulae for this purpose were first found by Karl Pearson, 
and were put into the matrix form in which we are about 
to quote them by A. C. Aitken (Ait ken, 1934). The matrix 
changes to : 



and in order to explain the meaning of these formulae we 
shall carry out the calculation for, a part of the above matrix 
only (the first four tests), with a strong recommendation to 
the reader to perform the whole calculation systematically. 
If we confine ourselves to the first four tests we have 



R PP = 



00 -72 

72 1-00. 

00 -42 

42 1-00 
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^.= 



R, P = 



63 
56 
63 
54 



54 
48 
56 
48 



3. The calculation of a reciprocal matrix. The most 
tiresome part of the calculation, if the number of directly 
selected tests is large, is to find R^ 1 the reciprocal of the 
matrix R pp . By the reciprocal of a matrix is meant 
another matrix such that the product 



R -i 

pp 



- P 

~ L 



= / 



where I is the so-called " unit matrix " which has unit 
entries in the diagonal and zero entries everywhere else. 
Such a reciprocal matrix can be found by means of Ait ken's 
method of pivotal condensation as follows (Ait ken, 1937a). 
Write the given matrix with the unit matrix below it and 
minus the unit matrix on its right, thus : 









Check 








Column 


1-00 -72 


1-0000 


. 


72 


72 1-00 


. 


-1-0000 


72 


1-00 


, 


. 


1-00 


1-00 








1-00 


4816 


7200 


1-0000 


201G 


1-0000 


1-4950 


-2-0764 


4180 


- -7200 


1-0000 


, 


2800 


1-0000 


- 





1-0000 




2-0704 


-1-4950 


5814 




-1-4950 


2-0764 

p 


5814 



As before, we -divide the first row of each slab through 
by its first member, writing the result in a row left blank 
for that purpose. Each pivot is thus unity, the whole 
calculation is made easier, and the process continues until 
the left-hand column no longer has any contents, when 
the numbers in the middle column are the reciprocal matrix. 
For a larger example of this automatic form of calculation 
see the Addendum on pages 350-1. That the matrix is 
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indeed the reciprocal we can check by direct calculation. 
We have 

1-00 -721 [~ 2-0764 -1-49501 _ Tl .1 
72 1-OOj L~ 1-4950 2 -0764 J ~~ [ 1 J 

Matrix multiplication is carried out by obtaining the 
inner products (see footnote, page 31) of the rows of the 
first matrix with the columns of the second. Thus 

1 X 2-0764 -72 X 1-4950 = 1 
1 X 1-4950 + -72 X 2-0764 =0 

are the two upper entries in the product matrix. When the 
reciprocal matrix R rp ~ l has thus been calculated, the best 
way of proceeding is to find 

and D = R P qq -R qp C 

In the case of our example these are 

r f 2 ' 0764 1-49501 r -68 -541 _ T-4709 -40371 
C ~ L-l-4950 2-0764JL-56 -48 J ~~ [-2209 -1894 J 

Tl-00 -421 T-63 -561 T -4709 -40371 
D ~ [_ '42 1-OOJ L' 54 '48 JL '2209 -1894J 

00 -421 _ T-4204 -36041 
42 1-OOJ L-3604 -3089 J 

5796 -05961 
0596 -6911J 

subtraction of matrices being carried out by subtracting 
each element from the corresponding one. We next need 

V r- P 36 '301 T -4709 -40371 _ T-2358 -20221 
y pp^ ~ L.SO -36J L-2209 -1894J ~" L* 2208 -1893J 

which gives us the new covariances of the directly selected 
tests with those indirectly selected. For V M we need still 
C'(V pp C) where the prime indicates that the matrix is 
transposed (rows becoming columns) 

, T-4709 -22091 T-2358 -20221 _ [-1598 -13701 

MrapC) 1^.4037 .I894j[_-2208 -1893J "~ L1370 -1175J 

and then 



f- 
|_- 



c>v c ~ ' 596 ] + r 1598 ' I87o i 

C V *^ ~ L-0596 -6911 J T L-1370 -1175 J 

T-7394 -19661 

-8086J 
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We now can write down the whole new 4x4 matrix 
of variances and covariances. In the same way, had we 
included the other tests and the factors, we would have 
arrived at the whole new 13 X 13 matrix for all the 
variances and covariances which we now print.* The 
values calculated above for the first four tests will be 
recognized in its top left-hand corner. (The diagonal 
entries are variances, not communalities.) 

Covariances in the Sample 





1 


2 


3 


4 


5 


6 


g 


*i 


2 


^3 ^4 ^5 ^ 


1 


36 


30 


24 


20 


17 


14 


34 


13 


05 





2 


30 


36 


22 


19 


16 


13 


32 


04 


18 


. 


3 


24 


22 


74 


20 


16 


13 


33 


-14 


-07 


71 ... 


4 


20 


19 


20 


81 


14 


11 


28 


-12 


-06 


80 . 


5 


17 


16 


16 


14 


87 


09 


23 


-10 


-05 


. -87 


6 


14 


13 


13 


11 


09 


92 


19 


-08 


-04 


92 


g 


34 


32 


33 


28 


23 


19 


47 


-19 


-10 


. 


*i 


13 


04 


-14 


-12 


-10 


-08 


-19 


70 


32 


. 


<? 


05 


18 


07 


06 
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4. Features of the sample covariances. Examination of 
this matrix shows the following features : 

(1) The specifics of the indirectly selected tests have 
remained unchanged. They are still orthogonal to each 
other and all the other tests and factors (except each to 
its own test), are still of unit variance, and have still the 
same covariances with their own tests, though these will 
become larger correlations when the tests are restan- 
dardized ; 

(2) The specifics of the directly selected tests have 
become oblique common factors, correlated with everything 
except the other specifics ; 

* In such calculations on a larger scale, the methods of Aitken's 
(1937a) paper are extremely economical. Triple products of matrices 
of the form XY~ 1 Z can thus be obtained in one pivotal operation 
(see Appendix, paragraph 12 ; and Chapter VIII, Section 7, page 
139) 
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(3) The matrix of the indirectly selected tests is still of 
the same rank (here rank 1) ; 

(4) The variances of the factors g, s 1? and s z have been 
reduced to -47, -70, and -43. 

An experimenter beginning with this sample, and 
knowing nothing about the factors in the wider population, 
would have no means of knowing these relative variances, 
and would no doubt standardize all his tests. He certainly 
would not think of using factors with other than unit 
variance. And even if he were by a miracle to arrive at 
an analysis corresponding to the last table, with three 
oblique general factors, he would reject it (a) because of 
the negative correlations of some of the factors, and 
(b) because he can reach an analysis with only two common 
factors, and those orthogonal. It is therefore practically 
certain that he will not reach the population factors, at 
least as far as the directly selected tests are concerned. 
His data and his analysis will be as follows. The variances 
are all made unity and the covariances converted into 
correlations. The analysis into factors is a new one, not 
derived from the last table. 

Analysis in the Sample 
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5. Appearance of a new factor. The most noticeable 
change in this sample analysis, as compared with the 
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population analysis on page 189, is the appearance of a 
new " factor " h linking the directly selected tests, a factor 
which is clearly due entirely to that selection. What 
degree of reality ought to be attributed to it ? Does it 
differ from the other factors really, or have they also been 
produced by selection, even in the population, which is 
only in its turn a sample chosen by natural selection from 
past generations ? 

Otherwise the analysis is still into one common factor 
and specifics. The loadings of the common factor are 
less than they were in the population, and this, as our table 
of variances and covariances shows, is due to a real 
diminution in the variance of the common factor. The 
new common factor g' is a component of the old one. 

The loadings of Si and s 2 have also sunk, because they 
have been in part turned into a new common factor. The 
loadings of the other specifics have risen. But this is 
entirely because the variance of the tests has sunk due to 
the shrinkage in g, and is not due to any new specifics 
being added. 

All these considerations make it very doubtful indeed 
whether any factors, and any loadings of factors, have 
absolute meaning. They appear to be entirely dependent 
upon the population in which they are measured, and for 
their definition there would be required not only a given 
set of tests and a given technical procedure in analysis, but 
also a given population of persons. 

In our example, the covariance of Tests 1 and 2 in the 
new matrix V pp was made larger than would naturally 
follow from the changed variances of Tests 1 and 2, so 
that the correlation increased. In consequence the new 
factor h is one with positive loadings in both tests. 

We might equally well, however, have decreased the 
covariance in F^,, for example making 



r-36 .041 
pp L-04 -36 J 



and in that case (the reader is strongly recommended to 
carry out the calculations as an exercise) the new factor h 
will be an interference factor, with negative loading in one 
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of the two tests. In this case the experimenter, with a 
dislike for such negative loadings, would probably " ro- 
tate " his factors away from any position which had any 
simple relation to the factors of the population. 

Again, the formulae, moreover, can all be worked 
backward, the sample treated as the population and the 
population as the sample ; though as we said before, sam- 
ples in real life are certainly, as a rule, more homogeneous 
in nearly every quality than the complete population. 

NOTES, 1945. In Professor Thurstone's coming new edition, 
or new book, part of which I have been privileged to see in manu- 
script, he gives what he mildly calls " a less pessimistic inter- 
pretation than Godfrey Thomson's of the factorial results of selec- 
tion." His newer work on oblique factors certainly entitles him, 
I think, to hope that invariance of underlying factorial structure 
may be shown to persist underneath the changes, and we shall 
await further results with interest. 

1948. Since the above was written the new book referred to has 
appeared (Multiple Factor Analysis). There in his Chapter XIX 
will be found Thurstone's above-mentioned interpretation, which is 
further described and discussed on our pages 292 et seq. below. 

1945. It has sometimes been thought that Karl Pearson's selection 
formulae used in these chapters are only applicable when the vari- 
ables concerned are normally distributed in both population and 
sample. They have, however, a much wider application (Lawley, 
1943) and in particular are still applicable when the sample has been 
made by cutting off a tail from a normal distribution, as happens 
when children above a certain intelligence quotient are selected for 
academic secondary education (Thomson, lQ43a and 1944). 



PART IV 
CORRELATIONS BETWEEN PERSONS 



CHAPTER XIII 

REVERSING THE ROLES* 

1. Exchanging the rdles of persons and tests. In all the 
previous chapters the correlations considered have been 
correlations between tests, and the experiments envisaged 
were experiments in which comparatively few tests were 
administered to a large number of persons. For each test 
there would, therefore, be a long list of marks. The whole 
set of marks would make an oblong matrix, with a few 
rows for the tests, and a very large number of columns for 
the persons we will choose that way of writing it, of the 
two possibilities. 

From such a sot of marks we then calculated the 
correlation coefficients for each pair of tests, and our 
analysis of the tests into factors was based upon these. 
In the process of calculating a correlation coefficient we do 
such things to the row of marks in each test as finding its 
average, and finding its standard deviation. We quite 
naturally assume that we can legitimately carry out these 
operations. We assume, that is, that in the row of marks 
for one test these marks are comparable magnitudes which 
at any rate rise and fall with some mental quality even 
if they do not strictly speaking measure it in units, like 
feet or ounces. 

The question we are going to ask in this part of this 
book is whether, in the above procedure, the r61es of persons 
and of tests can be exchanged (Thomson, 1935ft, 75, 
Equation 17), and if so what light this throws upon 

* The first explicit references to correlations between persons in 
connexion with factor technique seem to have been made inde- 
pendently and almost simultaneously by Thomson (19356, July) and 
Stephenson (1935a, August), the former being pessimistic, the latter 
optimistic. But such correlations had actually been used much 
earlier by Hurt and by Thomson, and almost certainly by others. 
See Burt and Davies, Journ. Exper. Pedag., 1912, 1, 251. 
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factorial analysis. Instead of comparatively few tests 
(perhaps two or three dozen ; fifty-seven is the largest 
battery reported up to date) and a very large number of 
persons, suppose we have comparatively few persons, and 
a large number of tests, and find the correlations between 
the persons. In that case our matrix of marks would be 
oblong in the other direction, with a large number of 
rows for the tests, and a small number of columns for 
the persons, and each correlation, instead of being as 
before between two rows, would be between two columns. 
Taking only small numbers for purposes of an explanatory 
table, we would have in the ordinary kind of correlations 
a table of marks like this : 



Tests 



while for correlations between persons we would have a 
table of marks like this : 

Persons 



Tests 



But we meet at once with a serious difficulty as soon as 
we attempt to calculate a correlation coefficient between 
two persons from the second kind of matrix. To do so, 
we must find the average of each column, just as previously 
we found the average of each row for the other kind of 
correlation. But to find the average of each column (by 
adding all the marks in that column together and dividing 
by their number) is to assume that these marks are in 
some sense commensurable up arid down the column, 
although each entry is a mark for a different test, on a 
scoring system which is wholly arbitrary in each test 
(Thomson, 19356, 75-6). 
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To make this difficulty more obvious, let us suppose 
that the first four tests are : 

1. A form-board test ; 

2. A dotting test ; 

3. An absurdities test ; 

4. An analogies test. 

In each of these the experimenter has devised some 
kind of scoring system. Perhaps in the form-board test 
he gives a maximum of 20 points, and in the dotting test 
the score may be the number of dots made in half a minute. 
But to find the average of such different things as this is 
palpably absurd, and the whole operation can be entirely 
altered by an arbitrary change like taking the number of 
seconds to solve the form board instead of giving points. 

2. Ranking pictures, essays, or moods. This is a very 
fundamental difficulty which will probably make correla- 
tions between persons in the general case impossible to 
calculate. In certain situations, however, it does not arise, 
namely where each person can put the " tests " in an 
order of preference according to some criterion or judg- 
ment (Stephenson, 19356), and it is with cases of this kind 
that we shall deal in the first place. Usually the " tests " 
here are not really different tests like those named above, 
but are perhaps a number of children's essays which have 
to be placed in order of merit, or a number of pictures in 
order of aesthetic preference, or a number of moods which 
the subject has to number, indicating the frequency of 
their occurrence in himself. Indeed, the subject might not 
only give an order of preference to, say, the essays, but 
might give them actual marks, and there would be no 
absurdity in averaging the column of such marks, or in 
correlating two such columns, made by different persons. 

Such a correlation coefficient would show the degree of 
resemblance between the two lists of marks given to the 
children, or given to a set of pictures according to their 
aesthetic value. It would indicate, therefore, a resemblance 
between the minds of the two persons who marked the 
essays or judged the pictures. A matrix of correlations 
between several such persons might look exactly like the 
matrices of correlations between tests which occur in 

F.A. 7* 
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Parts I and II, and could be analysed in any of the same 
ways. What would the " factors " which resulted from 
such an analysis mean when the correlations were between 
persons ? Take an imaginary hierarchical case first. 

3. The two sets of equations. In test analysis the common 
factor found was taken to be something called into play 
by each test, the different tests being differently loaded 
with it. The test was represented by an equation such 
as 

* 4 = -Kg + -8*4 

For each of the numerous persons who formed the sub- 
jects of the testing, an estimate was made of his g, and 
another estimate could be made of his s 4 . The different 
tests were combined into a weighted battery for this 
purpose of estimating a man's amount of g. His score in 
Test 4 would then be made up of his g and s 4 inserted in 
the above specification equation. 

*4-9 = -6g9 + -854., 

would be the score of the ninth person in Test 4. 

By analogy, when we analyse a matrix consisting of 
correlations between persons, we arrive at a set of equations 
describing the persons in terms of common and specific 
factors. Corresponding to a hierarchical battery of tests, 
we could conceivably have a hierarchical team of persons, 
from which we would exclude any person too similar to 
one already included. Each person in the hierarchical 
team would then be made up of a factor he shared with 
everyone else in the team, and a specific factor which was 
his own idiosyncrasy. An equation like 



would now specify the composition of the ninth person. 
g' is something all the persons have, s 9 ' is peculiar to 
Person 9. The loadings now describe the person, and the 
amount of g' " possessed " or demanded by each test can 
be estimated by exactly the same techniques employed in 
Part I. The score which Test 4 would elicit from Person 9 
would be obtained by inserting the g' and s 9 ' " possessed " 
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by that test into the specification equation of Person 9, 
giving 



This equation is to be compared with the former equation 



Both equations ultimately describe the same score, but 
2 9 . 4 is not identical with s 4 . 9 . The raw score X is the same, 
but the one standardized z is measured from a different 
zero, and in different units, from the other. Disregarding 
this for the moment, we see that with the exchange of 
roles of tests and persons, the loadings and the factors have 
also changed rdles. Formerly, persons possessed different 
amounts of g, and tests were differently loaded with it. 
Now, tests possess different amounts of g', and persons are 
differently loaded with it. We feel impelled to inquire 
further into the relationships of these complementary 
factors and loadings. 

The test which is most highly saturated with g is that 
one which, in terms of Spearman's imagery, requires most 
expenditure of general mental energy, and is least depen- 
dent upon specific neural engines. It correlates more 
with its fellow-members of the hierarchical battery than 
any other test among them does. It represents best what 
is common to them all. 

The man, in a hierarchical team of men, who is most 
highly saturated with g' is that man who is most like all 
the others. His correlations with them are higher than is 
the case for any other man in the team. He is the indi- 
vidual who best represents the type. But a nearer ap- 
proach to the type can be made by a weighted team of men, 
just as formerly we weighted a battery of tests to estimate 
their common factor. 

4. Weighting examiners like a Spearman battery. Corre- 
lations of this kind between persons were used long before 
any idea of what Stephenson has called " inverted factorial 
analysis " was present. The author and a colleague found 
in the winter of 1924-5 a number of correlations between 
experienced teachers who marked the essays written by 
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fifty schoolboys upon " Ships " (Thomson and Bailes, 
1926). One table or matrix of such correlations, between 
the class teacher and six experienced head masters who 
marked the essays independently of one another, was as 
follows : 
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In the article in question, these different markers were 
compared by correlating each with the pool of all the rest. 
These correlations are shown in the first row of the table 
below. 

Purely as an illustrative example, let us make also an 
approximate analysis of this matrix, and take out at any 
rate its chief common factor. On the assumption that it 
is roughly hierarchical, we can use Spearman's formula 



Saturation - 



[T - 2 



More easily we can insert its largest correlation coefficient 
as an approximate communality for each test, and find 
Thurstone's approximate first-factor loadings (see Chapter 
II, page 24). We get for the saturations or loadings the 
second and third rows of this table : 



Correlation with pool of rest 
Spearman saturations 
Thurstonc method 



Te A B 



D E F 



77 -67 -76 -73 -76 -75 -82 
814 -704 -796 -766 -798 -788 -861 
81 -73 -80 -78 -80 -80 -85 



We see that F is the most " typical " examiner of these 
essays, in the sense that he is more highly saturated with 
what is common to all of them ; while A conforms least 
to the herd. 

With the same formula which in Part I we used to esti- 

* See Chapter IX, page 154. 
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mate a man's g from his test-scores, we could here estimate 
an essay's g' from its examiner scores. That is to say, the 
marks given by the different examiners would be weighted 
in proportion to the quantities 

Saturation with g' 
I saturation 2 

where g' is that quality of an essay which makes a common 
appeal to all these examiners. Their marks (after being 
standardized) would therefore be weighted in the propor- 
tions -814/(1 -814 2 ), etc., that is : 

Te A B C D E F 
2-41 1-40 2-17 1-85 2-20 2-08 3-33 
or -72 -42 -65 -56 -66 -63 1-00 

to make global marks for the essays, which could then be 
reduced to any convenient scale. If this were done, the 
result would be the " best " estimate * of that aspect or 
set of aspects of the essay which all these examiners are 
taking into account, disregarding all that can possibly be 
regarded as idiosyncrasies of individual examiners. 
Whether we think it the best estimate in other senses is a 
matter of subjective opinion. We may wish the " idiosyn- 
crasies " (the specific, that is) of a certain examiner to be 
given great weight. It clearly would not do, for example, 
to exclude Examiner A from the above team merely because 
he is the most different from the common opinion of the 
team, without some further knowledge of the men and the 
purpose of the examination. The " different " member in 
a team might, for example, be the only artist on a com- 
mittee judging pictures, or the only Democrat in a court 
judging legal issues, or the only woman on a jury trying 
an accused girl. But in non -controversial matters, if all 
are of about equal experience, it is probable that this 
system of weighting, restricting itself to what is certainly 
common to all, will be most generally acceptable as 
fairest. 

* Best whether we adopt the regression principle or Bartlett's. 
For if only one " common factor " is estimated, the difference is 
one of unit only, and the weighting in the text is the " best " on 
both systems. 
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5. Example from " The Marks of Examiners." This 
form of weighting examiners' marks has probably never 
yet been used in practice. But it has been employed, by 
Cyril Burt, in an inquiry into the marks given by examiners 
(Burt, 1936). As an example, we take the marks given 
independently by six examiners to the answer papers of 
fifteen candidates aged about 16, in an examination in 
Latin. (The example is somewhat unusual, inasmuch as 
these candidates were a specially selected lot who had all 
been adjudged equal by a previous examiner, but it will 
serve as an illustration if the reader will disregard that 
fact.) The marks were (op. cit., 20) : 
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The correlations between the examiners calculated from 
this table are (the examiner with the highest total correla- 
tion leading) : 
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If, assuming this table to be hierarchical, we find each 
examiner's saturation with the common factor by Spear- 
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man's formula, we obtain (with Professor Burt, op. cit. 9 

294): 

F A B E D C 

95 -92 -91 -87 -84 -72 

In the sense, therefore, of being most typical, F is here 
the best examiner. The proportionate weights to be given 
to each examiner, in making up that global mark for the 
candidate which will best agree with the common factor of 
the team of examiners, are, as before 

Saturation 
1 saturation 2 

provided the marks have first been standardized. The 
resulting weights, giving F the weight unity, are : 

F A B E D C 

1-00 -61 -54 -37 -29 -15 

(If the weights are to be applied to the raw or unstan- 
dardized marks, they must each be divided by that 
examiner's standard deviation.) 

The marks thus obtained are only an estimate of the 
46 true " common-factor mark for each child, just as was 
the case in estimating Spearman's g ; and the correlation 
of these estimates with the " true " (but otherwise undis- 
coverable) mark will be, as there (Chapter VII, page 106) 

S 



-Vr 



+ S 
where S is the sum of all the six quantities 

Saturation 2 
1 saturation 2 
In our case this gives 

r m = -98 

The best examiner's marking itself correlated with the 
hypothetical " true " mark to the amount *95, so that 
the improvement is not worth the trouble of weighting, 
especially as the simple average of the team of examiners 
gives *97. But in some circumstances the additional 
labour might be worth while, and there is an interest in 
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knowing which examiners conform least and which most 
to the team, and having a measure of this. 

After the saturation of each examiner with the hypothet- 
ical common factor has been found, the correlations due 
to that factor can be removed from the table exactly as 
in analysing tests in Chapter II, pages 27 and 28, or in 
Chapter IX, page 155. The residues, as there, may show 
the presence of other factors ; and " specific " resem- 
blances or antagonisms between pairs of examiners, or 
minor factors running through groups of examiners, may 
be detected and estimated. 

In short, all the methods of Parts I and II of this book 
there used on correlations between tests may be employed 
on correlations between examiners. The tests have come 
alive and are called examiners, that is all. But since the 
child's performance, judged by the different examiners 
differently, is here nevertheless the same identical per- 
formance, our interpretation of the results is different. 
The two cases throw light on one another. A Spearman 
hierarchical battery of tests may estimate each child's 
general intelligence, which is there something in common 
among the tests. The examiners may have been instructed 
to mark exclusively for what they think is general intelli- 
gence. In that case their weighted team will estimate 
for each child a general intelligence, which is something 
in common among the somewhat discrepant ideas the 
examiners hold on this matter. 

6. Preferences for school subjects. In the previous sec- 
tions we have discussed correlations between examiners 
who all mark the same examination papers. The purpose 
of their marking these papers is to award prizes, distinc- 
tions, passes, and failures to the candidates. The exam- 
iners are a means to this end ; the reason for employing 
several of them is to obtain a list of successes and failures 
in which we can have greater confidence. The technique 
described is one which enables us to combine their marks, 
on certain assumptions, to greatest advantage. But it 
can, as in the inquiries described in The Marks of Examiners, 
be turned to compare individual examiners, and to evaluate 
the whole process of examining. 
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It is only a step to another, very similar, experiment in 
which objects evaluated by the " examiners " are not the 
works of candidates in an examination, but are objects 
chosen for the express purpose of gaining an insight into 
the minds of those asked to judge them. Thus we might 
ask several persons each to evaluate on some scale the 
aesthetic appeal of forty or fifty works of art (Stephenson, 
19366, 353), or ask a number of school pupils each to place 
in order of interest a list of school subjects. 

Stephenson (1936a) asked forty boys and forty girls 
attending a higher school in Surrey, England, thus to 
place in order of their preference twelve school subjects 
represented by sixty examination papers, and calculated 
for about half these pupils the correlation coefficients 
between them. To explain the kind of outcome that may 
be expected from such an experiment it will be sufficient 
for us to quote his data for a smaller number of pupils, 
say eight girls, avoiding anomalous cases for simplicity in 
a first consideration. The correlations between them were 
as follows (op. cit., 50) : 
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72 


79 


40 


. 



This table at once suggests that these girls fall into two 
types. Girls 3, 4, 5, and 7 correlate positively among 
themselves ; they have somewhat similar preferences 
among school subjects. Girls 17, 18, 19, and 20 correlate 
positively among themselves. But the two groups correlate 
negatively with one another. The two types were different 
in their order of preference, Type I tending, for example, 
to put English and French higher, and Physics and 
Chemistry lower, than Type II (though both were agreed 
that Latin was about the least lovable of their studies !). 
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7. A parallel with a previous experiment This experi- 
ment, it will be seen, forms a parallel to that inquiry (also 
by Stephenson) described in Chapter I, Section 9, where 
tests fell into two types, verbal and pictorial, with correla- 
tions falling there as here into four quadrants. If we call 
the two types of school pupil here the linguistic (L) and 
the scientific (S), and again use C for the cross-correlations, 
the diagram corresponding to that on page 16 of Chapter I 
is : 



The chief difference between the two cases is that there 
the cross-correlations, though smaller than hierarchical 
order in the whole table would demand, were nevertheless 
positive. Here, however, the cross-correlations are 
actually negative. 

It is true that the signs of all the correlations in the C 
quadrants can in either case be reversed, by reversing the 
order of the lists either of all the earlier or all the later 
variables (there tests, here pupils). But that is not really 
permissible in either case. We have no doubt which is 
the top and which the bottom end of a list of marks, 
whether in a verbal test or a pictorial test ; and to reverse 
the order of preference given by either the linguistic or the 
scientific pupils would be simply to stultify the inquiry. 
There is, therefore, a real difference between the cases. 
In the present set of correlations something is acting as an 
" interference factor." 

In Chapter I we explained the correlations and their 
tetrad-differences by the hypothesis of three uncorrelated 
factors g, v 9 and p required in various proportions by the 
tests, and possessed in various amounts by the children. 
The loadings which indicated the proportions of the factors 
in each test we tacitly assumed to be all positive. Thur- 
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stone expressly says that it is contrary to psychological 
expectation to have more than occasional negative loadings. 

8. Negative loadings. Let us endeavour to make at least 
a qualitative scheme of factors to express the correlations 
between the pupils, factors possessed in various amounts 
by the subjects of the school curriculum, and demanded 
in various proportions by each pupil before he will call 
the subject interesting. One type of pupil weights heavily 
the linguistic factor in a subject in evaluating its interest 
to him. The other type weights heavily the scientific 
factor in a subject in judging its attraction for him. But 
to explain actual negative correlations between pupils we 
must assume that some of the loadings are negative, 
assume, that is, that some of the children are actively 
repelled by factors which attract others. Common sense 
does not think thus. Common sense says that two children 
may put the subjects in opposite orders, even though they 
both like them all, provided they don't like them equally 
well. But then common sense is not anxious to analyse 
the children into uncorrelated additive factors. If each 
child is thus expressed as the weighted sum of various 
factors, two children can correlate negatively only if some 
of the loadings are negative in the one child and positive 
in the other, for the correlation is the inner product of the 
loadings. Since Stephenson has found numerous nega- 
tive correlations between persons, and since few negative 
correlations are reported between tests, we seem here to 
have an experimental difference between the two kinds of 
correlation, and if ever correlations between persons come 
to be analysed as minutely and painstakingly as correla- 
tions between tests, it would seem that the free admission 
of negative loadings would be necessary.* The present 
matrix can in fact be roughly analysed into two general 
factors, one of which has positive loadings in all pupils, 
while the other is positively loaded in the one type, 
negatively loaded in the other. 

9. An analysis of moods. A still more ingenious appli- 
cation by Stephenson of correlations between persons is in 
an experiment in which for each person a " population " 

* See Stephenson, 1936&, 349. 
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of thirty moods, such as " irascible," " cheerful," " sunny," 
were rated for their prevalence and intensity for each of 
ten patients in a mental hospital, and for six normal 
persons (Stephenson, 1936c, 363). This time the correla- 
tion table indicated three types, corresponding to the 
manic-depressives, the schizophrenes, and the normal 
persons, each type correlating positively within itself, but 
negatively or very little with the other types. These 
experiments were only illustrative, and it remains to be 
seen whether factors which will prove acceptable psycho- 
logically will be isolated in persons in the same manner as g, 
and the verbal factor, have been isolated in tests. The 
parallel between the two kinds of correlation and analysis 
is, however, certainly likely to throw light on the nature of 
factors of both kinds. 



CHAPTER XIV 

THE RELATION BETWEEN TEST FACTORS 
AND PERSON FACTORS 

1. Burfs example, centred both by rows and by columns. In 
the examples we have just considered, there is no doubt 
that correlations between persons can be calculated without 
absurdity. In the matrix of marks given by a number of ex- 
aminers (marking the same paper) to a number of candidates, 
either two candidates can be correlated, or two examiners. 
The heterogeneity of marks referred to in Chapter XIII, 
Section 1, does not enter as a difficulty. Still keeping to 
such material, let us ask ourselves what the relation is 
between factors found in the one way, and factors found in 
the other. Qualitatively, we have already suggested that 
factors and loadings change roles in some manner. The 
most determined attempt to find an exact relationship has 
been that made by Cyril Burt, who concludes that, if the 
initial units have been suitably chosen, the factors of the 
one kind of analysis are identical with the loadings of the 
other, and vice versa (Burt, 19376). The present writer, 
while agreeing that this is so in the very special circum- 
stances assumed by Burt, is of opinion that his is a very 
narrow case, and that the factors considered by Burt are 
not typical of those in actual use in experimental psycho- 
logy. Theoretically, however, Burt's paper is of very great 
interest. It can be presented to the general reader best 
by using Burt's own small numerical example, based on a 
matrix of marks for four persons in three tests : 

Persons abed 



1 

Tests 2 
3 



6 204 
3 113 
33 11 

213 
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It will be noticed that this matrix of marks is already 
centred both ways. The rows add up to zero, and so do 
the columns. The test scores have been measured from 
their means, and then thereafter the columns of personal 
scores have been measured from their means ; or it can 
be done persons first, tests second, the end result being 
the same. Burt does not give the matrix of raw scores 
from which the above matrix comes. 

If we take the doubly centred matrix as he gives it, the 
matrices of variances and covarianccs formed from it are : 

Test Covariances 

123 

1 j 56 28 28 

2 j 28 20 8 

3 ! 28 8 20 



Person Covariances 





a 


b 


c 


d 


a 


54 


18 





36 


b 


18 


14 


4 


8 


c 





4 


2 


2 


d 


36 


8 


2 


26 



Notice that in both these matrices the columns add to 
zero, just as they do in the matrices of residues in the 
" centroid " process. 

2. Analysis of the Covariances. Burt next proceeds to 
analyse each of these by Hotelling's method. It seems 
clear that there will exist some relation between the two 
analyses, since the primary origin of each matrix is the 
same table of raw marks, and to show that relation most 
clearly Burt analyses the covariances direct, and not the 
correlations which could be made from each table (by 
dividing each covariance by the square root of the product 
of the two variances concerned). For the two Hotelling 
analyses he obtains (and the Thurstone factors before 
rotation would here be the same) : 
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Analysis of the Tests 
x 1 = 2 Vl4 YI 
# 2 = A/14 Yi + V6 Y 2 
#3 ~Vl4 YI V6 Y 2 

Analysis of the Persons 



d= 2V6/!- V2/2 

In both cases two factors are sufficient (there will always 
be fewer Hotelling or Thurstone factors than tests with 
a doubly centred matrix of marks, for a mathematical 
reason). The reader can check that the inner products 
give the co variances, e.g. 

co variance (bd) = V 6 X 2 V 6 2 V 2 X V 2 = 12 "~ 4 = 8 
The method of finding Hotelling loadings was described 
in Chapter V, and the reader can readily check that the 
coefficients of YU for example, do act as required by that 
method. For if we use numbers proportional to 2y^l4, 
\/14, and \/14, namely 1, |, J, as Hotelling 
multipliers we get : 

56 28 28 

28 20 8 

28 8 20 

56 28 28 
1410 4 
14 410 



84 42 42 

proportional to 1 J \ as required. 

The largest total (84) is the first "latent root," and the 
multipliers 1, i, ^, have to be divided, according to 
Chapter V, by the square root of the sum of their squares, 
and multiplied by the square root of 84, giving 

2V14 V 14 -~V 14 
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3. Factors possessed by each person and by each test. 
Burt then goes on to " estimate," by " regression equa- 
tions," the amount of the factors y possessed by the 
persons, and the amount of the factors / possessed by the 
tests. There is a misuse of terms here, for with Hotelling 
factors there is no need to " estimate " ; they can be 
accurately calculated : but that is a small point. The first 
three equations can be solved for the y's there is indeed 
one equation too many, but it is consistent. And the four 
equations of the second group can be solved for the /'s 
again they are consistent. Since the equations are con- 
sistent, we can choose the easiest pair in each case to solve 
for the two unknowns. Choosing the two equations for 
x v and # 2 we obtain 

Yl = 2714 *' 

__ x, + x 

12 - ~ 



For the other set of factors we naturally choose the 
equations in a and c, and have 



f 
Jl 



Now, since we are very liable to confusion in this dis- 
cussion, let us remind ourselves what these factors y and 
these factors / are. The factors y are factors into which 
each test has been analysed. They do not vary in amount 
from test to test, but each test is differently loaded with 
them. They vary in amount from person to person. 

The factors /are factors into which each person has been 
analysed. These do not vary in amount from person to 
person, but from test to test. Each person is differently 
loaded with them, that is, made up of them in different 
proportions. The y's are uncorrelated fictitious tests : the 
/'s are uncorrelated fictitious persons. 
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Now, from the equations 

I 

Yl = 



I 



we can find the amount of each factor y x and y 2 possessed 
by each person, by inserting his scores x l and x 2 in these 
equations, scores which are given in the matrix : 

a bed 

- 6 2 4 
3 i _ i _ 3 

33 11 



I 
2 
3 



Thus the first person possesses y x in an amount 
6/2 A/14, because his x l is 6. For the four persons 
and the two factors wo find the amounts of these factors 
possessed by each person to be : 

Factors y, y 2 



d 



3 







A/ 6 
1 

A/6 



2 

A/14 



4. Reciprocity of loadings and factors. These are the 
amounts of the factors y possessed by the four persons. If 
now the reader will compare them with the loadings of 
the factors /in the second set of equations on page 215, 
he will see a resemblance. The signs are the same, and 
the zeros are in the same places. Moreover, the resemblance 
becomes identity if we destandardize the factors f l and / 2 , 
measuring the former in units \/84< times as large, and the 
latter in units yl2 times as large, 84 and 12 being the 
non-zero latent roots of both matrices, In these units let us 
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use fa and <f> 2 for them. The equations on page 215 giving 
the analysis of the persons then become 

3V6 , r 3 , 

- 



b = (^M/O + ( Via/.) = ~ 






It will be seen that the loadings of fa and < 2 are identical 
with the amounts of YI and y 2 in the table on page 217. 
A similar calculation could be made comparing the amounts 
of fi and / 2 possessed by the tests with the loadings of YI 
and y 2 (suitably destandardized) in the analysis of the 
tests. As we said at the outset, if suitable units are chosen 
for the marks and the factors, the loadings of the personal 
equations are the factors of the test equations, and the 
factors of the personal equations are the loadings of the 
test equations. But only for doubly centred matrices of 
marks. It would be wrong to conclude in general that 
loadings and factors are reciprocal in persons and tests. 

Indeed, even for doubly centred matrices of marks, this 
simple reciprocity holds only for the analysis of the 
covariances and not for analyses of the matrices of corre- 
lations. Except by pure accident (and as it happens, 
Burt's example is in the case of test correlations such an 
accident), the saturations of the correlation analysis will not 
be any simple function of the loadings of the covariance 
analysis. 

5. Special features of a doubly centred matrix. But in 
any case, a matrix of marks which has been centred both 
ways is one in which only a very special kind of residual 
association between the variables is present. Most of what 
we commonly call the association or resemblance between 
either tests or persons, the amount of which we gauge by 
the correlation coefficient, is due to something over and 
above this. We can write down an infinity of possible raw 
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matrices from which Burt's doubly centred matrix might 
have come. To the rows of the latter matrix we can add 
any quantities we like without in the slightest altering the 
correlations between the tests, but making enormous 
changes in the correlations between the persons. Let us, 
for example, add 10 to the top row, 13 to the middle row, 
and 16 to the bottom row. There results the matrix : 

abed 



1 4 12 10 14 

2 16 14 12 10 (A) 

3 19 13 17 15 

This gives as correlations between the persons : 
a bed 



a 
b 
c 
d 



1-00 -75 -84 -14 

75 1-00 -28 -76 

84 -28 1-00 -42 

- -14 -76 -42 1-00 



Next, without changing this matrix of correlations 
between persons in the slightest, wo can add any quantities 
we like to the columns of the matrix of marks, and produce 
an infinity of different matrices of correlations between 
tests. If, for example, we add 5, 2, 8, and 9 to the four 
columns, we have a matrix of raw marks : 

abed 



1 


9 


14 


18 


23 


2 


21 


16 


20 


19 


3 


24 


15 


25 


24 



This has the same correlations between persons, but the 
correlations between tests are now : 

123 



1 
2 
3 



1-00 -16 

-16 1-00 

23 -92 



24 

92 

1-00 



Or instead, by adding suitable numbers to the columns 
and to the rows, we might have arrived at the matrix : 
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a 


b 


c 


d 


1 


44 


48 


18 


10 


2 


63 


57 


27 


13 (C) 


3 


58 


48 


24 


10 



or equally well at : 



d 



1 


35 


45 


37 


43 


2 


34 


34 


26 


26 (D) 


3 


34 


30 


28 


28 



The order of merit of the persons in each test is quite 
different in each of these matrices. The order of difficulty 
of the tests for each person is quite different in each. If 
we consider the ordinary correlation between Tests 1 and 2, 
we find that it is negative in (B) 9 zero in (D), and positive 
in (C), yet all of these matrices reduce to Burt's matrix 
when centred both ways. It is clear that they contain 
factors of correlation which are absent in the doubly 
centred matrix. 

The averages of the rows and the columns of (C) are as 
follows : 





a 


b 


c 


d 


Average 


1 


44 


48 


18 


10 


30 


2 


63 


57 


27 


13 


40 


3 


58 


48 


24 


10 


35 


Average 


55 


51 


23 


11 





The correlation between two tests is clearly influenced 
very much by the fact that here the person a is so much 
cleverer than the person d. Similarly, the correlation 
between two persons is influenced by the fact that Test 1 
is more difficult than Test 2. As soon as the matrix is 
centred both ways, all the correlation due to these and 
similar influences is almost extinguished. Centred by rows, 
(C) becomes : 



14 18 12 20 
23 17 13 27 
23 13 11 25 
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and all the tests are equally difficult on the average. 
Centred by columns as well, it becomes : 

6 2 4 

3 1 1 3 

33 1 - 1 

and not only are all the tests equally difficult on the average, 
but all the persons are equally clever on the average. It 
is to the covariances still remaining that Burt's theorem 
about the reciprocity of factors and loadings applies. It 
does not apply to the full covariances of the matrix centred 
only one way, in the manner usually meant when we speak 
of covariances or of correlations. 

6. An actual experiment. Since the first edition of this 
book, Burt's The Factors of the Mind has appeared 
(London, 1940). In Part I Burt discusses with keen 
penetration the logical and metaphysical status of factors, 
concluding " that factors as such are only statistical 
abstractions, not concrete entities." Part II discusses the 
connexion between the different methods of factor analy- 
sis ; and appendices give worked examples of Burt's 
special methods of calculation. His principle of reci- 
procity of tests and persons is seen in an actual illustrative 
experiment, in his Part III on the distribution of tempera- 
mental types. 

This experiment was on twelve women students, 
selected because the temperamental assessments made by 
various judges on them were more unanimous than in the 
case of the other students. Each, therefore, was a well- 
marked temperamental type. They were assessed for the 
eleven traits seen in the table below. The assessments 
over each trait were standardised, i.e. measured in such 
units and from such an origin that their sum was zero and 
the sum of their squares twelve, the number of persons, 
so that the group was (artificially) made equal in an 
average of sociability, sex, etc. The correlations between 
the traits were then calculated and centroid factors taken 
out, the first two of which I shall call by the Roman letters 
u and v. These two are possessed in some amount by 
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each of the persons, and required, in degrees indicated by 
the saturation coefficients, by each of the traits. These 
saturation coefficients have been found by analysis of the 
correlations between the traits. 

Now according to the reciprocity principle, if we analyse 
instead the correlations between the persons, find factors 
which we may indicate by Greek letters, and measure the 
amounts of these possessed by the eleven traits, these 
amounts ought to be the same ^s the saturation coefficients 
of the Roman factors u, v 9 etc. 

Burt therefore further standardizes the assessments, 
by persons this time, and finds the total scores on each 
trait, which are, by a property of centroid factors (see 
page 100) proportional to the amounts of a centroid Greek 
factor possessed by the eleven traits ; and the test of the 
reciprocity hypothesis is to see whether these totals are 
similar to the saturations of a Roman factor. The figures 
(from Burt's page 405) are given in the table below : 



Saturations of the 


Amounts of the 


Roman factors 


Greek factor 


u 


V 


oc 


671 


508 


587 


878 


213 


489 


827 


483 


378 


951 


233 


297 


824 


241 


280 


780 


-268 


001 


898 


- -159 


- -089 


259 


- -104 


- -337 


564 


- -667 


- -447 


830 


- -490 


- -489 


412 


- -685 


- -525 



Sociability 

Sex. 

Assertiveness 

Joy. 

Anger 

Curiosity . 

Fear 

Sorrow 

Tenderness 

Disgust . 

Submissiveness 



Clearly the amounts of a do not correspond to the 
saturations of u ; nor should they, for a general factor 
has already been eliminated by the double standardization. 
They do, however, agree reasonably well with the satura- 
tions of the second Roman factor v 9 and confirm Burt's 
prediction that, even in this sample, and with factors 
which are not exactly principal components, the reci- 
procity principle would still hold approximately. 



PART V 
THE INTERPRETATION OF FACTORS 



CHAPTER XV 

THE DEFINITION OF g 

1. Any three tests define a " g." This concluding part will 
be devoted to an attempt to answer the questions: " What 
are factors ? What is their psychological and physio- 
logical interpretation ? On what principles are we to 
decide between the different possible analyses of tests (and 
persons) ? " It may seem strange to have deferred these 
considerations so long, and to have discussed methods of 
analysing tests, and of estimating factors, before asking 
explicitly what they mean. But that is how "factors " 
have arisen. Whatever else they are, they certainly are 
not things which can be identified with clearness first, and 
discussed and measured afterwards. Their definition and 
interpretation arise out of the attempt to measure them. 
We shall begin by discussing, in the present chapter, the 
definition and nature of g. 

It will be remembered that the idea of g arose out of 
Professor Spearman's acute observation that correlation 
coefficients between tests tend to show hierarchical order : 
that is, that their tetrad-differences tend to be zero or small ; 
or in more technical terms still, that the rank to which a 
matrix of correlation coefficients can be " reduced " by 
suitable diagonal elements tends towards rank one. This 
fundamental fact is at the basis of all those methods of 
factorial analysis which magnify specific factors, and a 
reason for it, based on the idea that it is a mathematical 
result of the laws of probability, will be advanced in 
Chapter XX. In consequence of this fundamental fact, 
correlation coefficients between a number of variables can 
be adequately accounted for by a few common factors. To 
be adequately described by one only a g the " reduced " 
rank of the correlation matrix has to be one 9 within the 
limits of sampling error. 

This trouble of sampling error is very liable to obscure 

225 
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the issue, and we will remove it during most of the present 
chapter, as we did in Parts I and II, by supposing that we 
have defined our population (say all adult Scots, or all men, 
for that matter) and have tested every one of them. 

Suppose now that we have three tests and have, in this 
whole population, measured their correlation coefficients : 





1 


2 


3 


1 


1 


r lz 


?*13 


2 


7*12 


1 


7-23 


3 


7*13 


7*23 


1 



If, as is usually the case, these coefficients are all positive, 
and if each of them is at least as large as the product of the 
other two, we can explain them by assuming one g and 
three specifics s l9 s 2 , and s 3 . There are many other ways 
of explaining them, but let us adopt this one. We have 
thereby defined a factor g mathematically (Thomson, 1935a, 
260). It is then for the psychologist to say, from a 
consideration of the three tests which define it, what name 
this factor shall bear and what its psychological description 
is. The psychologist may think, after studying the tests, 
that they do not seem to him to have anything in common, 
or anything worth naming and treating as a factor. That 
is for him to say. Let us suppose that at any rate he does 
not reject the possibility, but that he would like an oppor- 
tunity of studying other tests which (mathematically 
speaking) contain this factor, and have nothing else in 
common, before finally deciding. 

In that case the experimenter must search for a fourth 
test which, when added to these three, gives tetrad- 
differences which are zero ; and then for a fifth and further 
tests, each of which makes zero tetrad-differences with the 
tests of the pre-existing battery. This extended battery 
the experimenter woulci lay before the psychological judge, 
to obtain a ruling whether the single common factor, of 
which it is the now extended but otherwise unaltered 
definition, is worthy of being named as a psychological 
factor. 

2. The extended or purified hierarchical battery. Mathe- 
matically, any three tests with which the experimenter 
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cared to begin would define " a " g, if we except temporarily 
the case, to which we shall later return, of three correlation 
coefficients, one of which is less than the product of the 
other two. The experimental tester, however, might in 
some cases have great difficulty in finding further tests, to 
add to the original three, which would give zero tetrad- 
differences. Unless he could do so, it is unlikely that the 
psychological judge would accept the factor as worthy of 
a name and separate existence in his thoughts. It is, for 
example, an experimental fact that starting with three 
tests which a general consensus of psychological opinion 
would admit to have only " intelligence " as a common 
requirement, it has proved possible to extend the battery 
to comprise about a score of tests without giving any 
tetrad-differences which cannot be regarded as zero. Even 
that has not been accomplished without difficulty, and 
without certain blemishes in the hierarchy having to be 
removed by mathematical treatment. But the fact that 
with these reservations it is possible, and that psychological 
judgment endorses the opinion that each test of this battery 
requires " intelligence," is the main evidence behind the 
actual " existence " of such a factor as " g, general intelli- 
gence." It must be noted that the word " existence " 
here does not mean that any physical entity exists which 
can be identified with this g. It does mean, however, that, 
as far as the experimental evidence goes, there is some 
aspect of the causal background which acts "as if " it 
were a single unitary factor in these tests. 

The process of making such a battery of tests to define 
general intelligence (see Brown and Stephenson, 1933) has 
not in fact taken the form of choosing three tests as the 
basal definition and then extending the battery. Instead, 
a number of tests which, it was thought from previous 
experience, would act in the desired way have been taken, 
and the battery thus formed has then been purified by the 
removal of any tests which broke the hierarchy. The 
removal of such tests does not, of course, mean that they 
do not contain g, but it means that g is not their only link 
with the other tests of the battery, and that therefore they 
are unsuitable members of a set of tests intended to define g. 
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Further, the actual making of such a hierarchical battery 
has not been accomplished under the ideal conditions 
which we have been assuming, namely, that the whole 
population has been accurately tested. There always re- 
mains some doubt, therefore, whether, without the blurring 
effect of sampling error, the hierarchy would continue to be 
near enough to perfection. But these details should not 
be allowed to obscure the simplicity of the main argument. 
The important point to note is that the experimenter has 
produced a battery of tests which is, he claims, hierarchical; 
that the mathematician assures him that such a battery 
acts " as if" it had only one factor in common (though it 
can also be explained in many other ways), and that the 
psychologist, who may be the same person as the experi- 
menter, agrees that psychologically the existence of such 
a factor as the sole link in this battery seems a reasonable 
hypothesis. 

3. Different hierarchies with two tests in common. Now, 
it must be remembered that, starting with three other 
tests, which may contain two of the former set, it may 
very well be possible to build up a different hierarchy. 
Only experiment could show whether this were possible in 
each case, there is no mathematical difficulty in the way. 
Such a hierarchy would also define " a " g, but this would 
be usually a different factor from the former g. If there 
were three tests common to the two hierarchies, then the 
two 's could be identified with one another (sampling 
errors apart), and the three tests would be found to have 
the same saturations with the one g as with the other. But 
if only two tests were common to the two batteries this 
would not in general be the case, and the different satura- 
tions of these tests with the two g's would show that the 
latter were different (Thomson, 1935a, 261-2). Under 
such circumstances the psychologist has to choose. He 
cannot have both these g's. Both are mathematically of 
equal standing, it is a psychological decision which has to 
be made. When one g is accepted, the other, as a factor, 
must then be rejected and a more complicated factorial 
analysis of the second hierarchy has to be built up which 
is consistent with this. A simple artificial example will 
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illustrate this. Suppose that four tests give a perfect 
hierarchy of correlations thus : 





1 


2 


3 


4 


1 


1-00 


72 


63 


54 


2 


72 


1-00 


56 


48 


3 


63 


56 


1-00 


42 


4 


54 


48 


42 


1-00 



On the principle that the smallest possible number of 
common factors must be chosen, the analysis of these tests 
would be 

Zl == -9g -\ 



23 = -7g + 
Zt = -6g + 

Suppose now that Tests 2 and 4 are brigaded with two 
other tests, 5 and 6, in a new experiment, and that the 
correlations found are : 



2 

4 
5 
6 



6 



1-00 -48 -42 -54 

48 1-00 -56 -72 

42 -56 1-00 -63 

54 -72 -63 1-00 



This is also a perfect hierarchy, and the principle of 
parsimony in common factors leads to the analysis 



(B) 



But this analysis is inconsistent with the former, for the 
saturations of z z and # 4 with their common link have 
changed. If the factor g has been accepted as a psycho- 
logical entity, then the factor g' cannot be. To be con- 
sistent we must begin our equations for z % and * 4 in the 
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same manner as before, and although we may split up 
their specifics to link them with the new tests, the only 
link between them themselves must be g. We can then 
complete the analysis in various ways,* of which one is 



+ -529150A + A/'36Z 4 
+ -463006/i + V'51J 5 
+ -595294A + V'19* 6 

4. A test measuring "pure g." Although the hierarchical 
battery defines a g, it does not enable it to be measured 
exactly (but only to be estimated) unless either it contains 
an infinite number of tests, or a test can be found which 
conforms to the hierarchy and has a g saturation of unity .f 
In the latter case this test which is " pure g " is such that 
when it is considered along with any other two tests of its 
hierarchy, its correlations with them, multiplied together, 
give the intercorrelation of those two with one another : 
if k is the " pure " test, then 

its g saturation being 

i 

r ik r jk __ -^ 
r H 

No such '' pure " test of the g which is defined by the 
Brown -Stephenson hierarchy of nineteen tests has yet been 
found. Such a pure test, with full g saturation, must not 
be confused with tests which are sometimes called tests of 
pure g because they do not contain certain other factors, 
in particular the verbal factor. Thus the " S.V.PJ 



99 



* Four tests are insufficient as a defining battery for two common 
factors. 

f It is understood, of course, that even such a test would give 
different measures of a man's g from day to day, if the man's per- 
formance in it varied (as it undoubtedly would) from day to day. 
By measuring with exactness is meant, in this part of the text, 
measurement free from the uncertainty due to the factors out- 
numbering the tests. The reader is reminded that we are assuming 
sampling errors to be nil, the whole population having been tested, 
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(Spearman Visual Perception) tests are referred to by 
Dr. Alexander (1935, 48) as a " pure measure of g " ; but 
their saturations with g are given by him (page 107) as 
757, -701, and -736 respectively, so that in each case only 
about half the variance is " g." A possible alternative to 
the plan of first defining g and then seeking to improve its 
estimate would be to begin with three tests satisfying the 
relation 

r ik r jk ~ r ij 

which were reasonably acceptable as a definition of general 
intelligence, and give greater content to the psychological 
significance of this g by discovering tests which were 
hierarchical with these three. The lack of an exact 
measure of what is at present called g is a serious practical 
defect. Another possible way of remedying this will be 
referred to below in connexion with what are there called 
" singly conforming " tests. First, however, let us con- 
sider the case where three tests are such that 



5. The Heywood case. In such a case the g saturation 
of the test k, if we calculate it, is greater than unity, which 
is impossible. Yet it is possible, in theory at least, to 
add tests to such a triplet to form an extended hierarchy 
with zero tetrad-differences. There can be one such case 
(but only one) in a hierarchy. Wo shall call them Heywood 
cases, as this possibility was first pointed out by him 
(Heywood, 1931). As an artificial example consider these 
correlations : 
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2 


3 


4 


5 


1 


1-000 


945 


840 


735 


630 


2 


945 


1-000 


720 


630 


540 


3 


840 


720 


1-000 


560 


480 


4 


735 


630 


560 


1-000 


420 


5 


630 


540 


480 


420 


1-000 



This is a perfect hierarchy, every tetrad-difference being 
exactly zero. It is, moreover, a perfectly possible set of 
correlations, and passes the tests required for a matrix of 
correlations to be possible. For example, the determinant 
of the matrix is positive (see Chapter IV ? Section 3, page 
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58). But when we calculate the g saturations of the tests 
we find them to be : 



Test 



g saturation 



1-05 



so that a single general factor is an impossible explanation 
of this hierarchy as far as Test 1 is concerned. The 
correlations of Test 1 with the other tests are possible, and 
they give exactly zero tetrad-differences : but yet the test 
cannot be a " two-factor " test, for the correlations of the 
first row are too high to be explained in that way. 

We might well have possessed the hierarchy of Tests 2, 
3, 4, and 5 first, before we discovered Test 1. We should 
then have analysed these four as follows in a two-factor 
analysis 



2 
-600*3 



Z 5 = -6g + -800*5 

We then, let us suppose, discover Test 1, with its 
impossible g saturation. We want to retain the above 
analysis for the other tests. Now can we analyse Test 1 
to explain its correlations with them ? We can do so in 
several ways. If we give it arbitrarily the loading -955 
for g, we must use the specific of each test to give the 
additional correlation required. We thus arrive at the 
following possible but complicated analysis of Test 1 

z l = -955g + -196*2 + -127* 3 + -093* 4 + -071* 5 + -141*! 

Here Test 1 is seen as containing each of the specifics 
of the four other tests, and only a small specific loading of 
its own. We have used up nearly all its variance in ex- 
plaining the correlations. Clearly there must be a limit 
to this process. If another test were added to the hier- 
archy, we might entirely exhaust the available variance of 
Test 1 in explaining its correlations. Or, indeed, the 
reader might add, we might more than exhaust it, and 
prove the impossibility of adhering to the pre-existing 
analysis. But this is not so. Such a test would only 
prove the impossibility of its own existence, if we may make 
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an Irish bull. Suppose, for example, a Test 6 were to turn 
up with the correlations : 

12345 



6 -882 -756 -672 -588 -504 

Such a test, when brigaded with Tests 2, 3, 4, and 5, would 
be given the analysis 



If now we use even the whole specific of this test as a 
link with Test 1, we cannot explain the correlation *882. 
We would need for that a loading of -150 for S Q in Test 1, 
and we have not enough variance left in Test 1 for this. 
But when this happens, we find that we have allowed the 
matrix of correlations to become an impossible one. If 
we add Test 6 to our matrix and calculate its determinant, 
we find it negative, which cannot occur in practice. The 
Test 6 could not occur, if the previous five tests already 
existed. Or vice versa, if Tests 2-6 existed, the Heywood 
case given would be impossible. The rule governing its 
possible existence has been given by Ledermann, namely, 
that the g saturation of the Heywood case cannot exceed 

+ S 



S 
where S is the quantity familiar from Spearman's formula 



2 



S =2 -. r -<f- 

i - V 

for the remainder of the hierarchy (i = 2, 3, 4 . . .). If, 
then, we have a large hierarchy, we shall find it impossible 
to discover a test which conforms to it and which at the 
same time has a g saturation greater than unity. If we 
have a small hierarchy containing a Heywood case, we 
shall find it impossible to discover many tests to add to it, 
except indeed by the formal device of adding tests which 
do not correlate with it at all. All these considerations 
make it appear likely that if a Heywood test can be found 
to conform to a hierarchy, the g defined by that hierarchy 
must be abandoned. The seeker for a test for pure g is 
thus in a delicate position. He wants to find a test with 

F.A. 8* 
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full saturation of unity. But he must just hit the mark. 
If the saturation exceeds unity, his whole hierarchy must 
be abandoned as a definition. And even when the exact 
saturation of unity has been found, there seems to be too 
narrow a line dividing the perfect from the impossible, and 
the reality of the g seems to be balanced on a knife edge. 
In actual practice, of course, sampling errors would make 
the situation less acute and could for some time be called 
in to explain a certain amount of excess saturation over 
unity, 

6. Hierarchical order when tests equal persons in number. 
If a test cannot be found whose saturation with g is unity 
(" pure g "), the other method of measuring g exactly 
would seem to be to extend the hierarchy until it comprised 
so many tests that the multiple correlation with g 

r - V s + 1 

became practically unity. For S increases with the number 
of tests, being the sum of the positive quantities 

v 

i - V 

There is here a point of some theoretical interest, namely, 
what happens when we have increased the number of 
hierarchical tests until they are as numerous as the persons 
to whom they are given ? This, in view of the difficulty of 
finding tests to add to a hierarchy, is admittedly not a 
question likely to trouble experimenters, but its theoretical 
implications are considerable. 

It can be shown that whenever we have a matrix of 
correlations based upon the same number of tests as 
persons, its determinant is zero. Now the determinant of 
a hierarchical matrix (with unity in each diagonal cell) 
can be shown to be of the form 

(i - iy )(i - v)(i - v)(i - v) 
+ v (* - VXi ~ VX 1 ~ V) 
+ (i - ry) v (i ~ VX 1 ~ V) 
+ (i - ry)(i - v) v (i - v) . . . 
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and it is clear that each of these quantities is positive 
unless we have a case of pure g, or a Hey wood case. A 
case of pure g will leave one of the rows of the above sum 
non-zero. To make the whole sum zero, one case must be 
a Heywood case, giving 

1 r ig * negative. 

It would seem, therefore, that by the time we have 
added hierarchical tests to make them equal in number to 
the persons, we will necessarily have added a Heywood 
hierarchical case (of which there can be only one in a 
hierarchy). But we have agreed that the discovery of a 
Heywood case will cause us to abandon the hierarchy as 
a definition of g I 

Mathematically this seems to mean that although the 
quantity S increases with each new test, provided it is not 
a Heywood case, yet S does not increase indefinitely, and 
the multiple correlation does not converge to perfect 
correlation. 

The case discussed above, where the number of tests is 
increased to equal the number of persons, may seem to 
the reader to be an academic case only. But the case of 
reducing the number of persons until they equal the number 
of tests is one which could easily be realized in practice, 
and presents equal theoretical difficulties. This draws at- 
tention from a new point of view to what has already 
been emphasized in Part III, the dependence of any 
definition of factors on the sample of persons tested. If 
we have a perfect hierarchy of, say, 50 tests, in a popula- 
tion of, say, 1,000 persons, and we reduce the number of 
persons by discarding some at random, it is, of course, to 
be expected that the correlations will change, and the 
hierarchy become disturbed. It would, however, at first 
sight appear possible to discard them so skilfully as not 
to disturb the hierarchy, or at least not disturb it much. 
But it would seem from the above considerations that try 
as we might, we could not, as the number of persons 
decreased towards fifty, prevent the correlations changing 
so as to give us a Heywood case, if we clung to hierarchical 
order. Or to put the same point in another way : a 
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sample of fifty persons from the above thousand, if it 
gives hierarchical order, will give a Hey wood case, and its 
g will be impossible. 

If the g corresponding to the original analysis on the 
thousand persons were anything real, such as a given 
quantity of mental energy available in each person, then 
it ought always to be possible, one might erroneously 
think, to find fifty persons and fifty tests to give a hierarchy, 
without a Hey wood case. But that cannot be easily said. 
It is impossible, from the correlations alone, to distinguish 
a real g from one imitated by a fortuitous coincidence of 
specifics. Even if g were a reality, a sample of persons 
equal in number to the tests could not give a hierarchy 
without a Heywood case, and their apparent g would be 
fortuitous. 

Now the case of a test of pure g is on the border line of 
the Heywood cases. It is clear then that it will be suspect, 
as being probably only fortuitous, if the number of persons 
does not far exceed the number of tests. 

7. Singly conforming tests. There remains one other 
conceivable method of measuring g exactly,* by the use 
of certain tests which, when they are all present, destroy 
the hierarchy, although any one of them can enter the 
battery without marring it " singly conforming " tests 
(Thomson, 1934c ; and 19350, 253-6). It will be remem- 
bered from the chapters on estimation that the reason 
factors cannot be measured exactly, but have to be esti- 
mated only, is that they outnumber the tests. Every 
new test which conforms to a hierarchy adds a new specific 
(unless it is pure g) 9 and thus continues the excess of factors 
over tests. It can occur, however, that the correlation of 
two tests with each other breaks a hierarchy, although 
either of them alone conforms otherwise. Such a case 
occurs in the Brown-Stephenson battery, for example, one 
of whose correlation coefficients has to be suppressed before 
the hierarchy is acceptable. 

In such a case, if the psychologist is prepared to accept 

* By " exactly " is meant, with the same exactness as the test 
scores, without the additional indeterminacy due to an excess of 
factors over tests. 
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either test as a member of the battery, the erring correlation 
coefficient must be due to these two tests sharing some 
portion of their specifics with one another. If, as may 
happen (apart from error which we are supposing absent), 
their intercorrelation shows that they have only one specific 
factor between them, and differ only in their saturations, 
then they enable the estimate of g to be turned into accurate 
measurement. For example, consider the following matrix 
of correlations : 
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669 


592 


458 


335 


251 
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669 
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566 


438 


870 


240 
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592 


566 
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387 


283 
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458 


438 


387 
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219 


164 
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335 


870 


283 


219 




120 
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251 


240 


212 


164 


120 


. 



This is a perfect hierarchy except for the correlation 
r 25 = -870 

Every tetrad-difference, which does not contain this 
correlation, is zero. If either Test 2 or Test 5 is removed 
from the battery, there remains a perfect hierarchy. If 
Test 5 is removed, we can calculate from the remaining 
battery the g saturations : 

Test 1234 6 



837 



800 



707 



548 



300 



1 
837 



3 

707 



4 
548 



5 
400 



300 



g saturation 

If we remove Test 2 and restore Test 5, we get the fol- 
lowing : 

Test 
g saturation 

From either hierarchy we can estimate g. The correla- 
tion of our estimates with " true g " will be 

S 

S + 1 
saturation 2 
saturation 14 



where 



S = 
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and we find for the two hierarchies the g correlations of 
92 and -90. 

Suppose now that we had left both Tests 2 and 5 in the 
battery with which to estimate g, after calculating their g 
saturations from the two separate hierarchies, what in- 
fluence would this have had upon the accuracy of our 
estimate ? It is of some interest actually to carry out 
this calculation by Aitken's method, using all the tests 
with the g saturations given above. A calculation keeping 
three places of decimals gives for the regression coefficients : 

Test 



Regression 
coefficient 



005 1-856 -003 -001 -1-213 -002 



which suggests (what would actually be the case if more 
decimals were retained throughout) that all the regression 
coefficients except those for Tests 2 and 5 vanish. If we 
calculate the multiple correlation of this battery with g, 
by finding the inner product of the g saturations with the 
above regression coefficients, we find that it is exactly 
unity. 

The reason for this is that the correlation of Tests 2 
and 5 is such as to show that their specifics are identical, 
the two tests differing only in their loadings. Their 
equations are 



If the whole of s 2 is identical with the whole of s 59 their 
intercorrelation should bo 



8 X -4 + V(l T 8 a 7(l~~4 a ) = -870 

and this is its experimental value. 

We could, therefore, have seen at the beginning, if we 
had tested the above fact, that these two tests would make 
a perfect battery for measuring g. We have the simul- 
taneous equations 

* 2 = *g + -6s 

z, - -4 + -9175 
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from which we can eliminate s by multiplying by 
917 and 600 

respectively, numbers which are exactly in the ratio of the 
regression coefficients found above 

1-856 and 1-213. 

In fact, we could have performed the regression calcula- 
tion on these two tests alone, when it would have appeared 
as follows : 





1-000 

870 
800 


870 
1-000 
400 


-1-000 


-1-000 


870 
870 
1-200 


(4 


1135) 


2431 


8700 


-1-000 


1131 






1 -0000 
-2960 


3-5787 
-8000 


4-1135 


4652 
5040 








1-8593 


1-2176 


6417 



giving (more exactly) the same regression coefficients as 
before. 

We see, therefore, that under certain hypothetical 
circumstances, a more exact estimate of g can be obtained 
from two of these " singly conforming " tests than the 
hierarchy with which they conform individually. Those 
circumstances are, that their correlation with one another 
(the correlation which breaks the hierarchy because it is 
too large) should either equal 



or should approach this value. 

It cannot in actual practice be expected to equal it, as 
in our artificial example. For we have disregarded errors, 
which are sure in some measure to be present. At what 
stage will the pair of singly conforming tests cease to be 
a better measure of g than the better of the two hierarchies 
made by deleting either the one or the other ? If in our 
example the correlation -870 of Tests 2 and 5 be imagined 
to sink little by little, the correlation of their estimate 
with g will sink from unity. The better of the two hier- 
archies gives a multiple correlation of '922. When the 
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correlation r 25 has sunk from '870 to -847, these two singly 
conforming tests will give the same multiple correlation, 
922. If this defect from the full -870 is due entirely to 
error, then a fall to '847 corresponds to reliabilities of the 
two tests of the order of magnitude of -98, if they are 
equally reliable. This is a very high reliability, seldom 
attained, so that in a case like our example quite a small 
admixture of error would make the singly conforming 
tests no better at estimating g than the hierarchy. We 
are here, however, neglecting the fact that error would also 
diminish the efficiency of the hierarchy. Nevertheless, the 
chance of finding a pair of singly conforming tests, highly 
reliable, and having no specifics except that which they 
share, seems small, as small as the chance of finding a test 
of pure g, perhaps. It might possibly turn out, however, 
that a matrix of several (say t) singly conforming tests 
would be practicable. Such a set would measure g exactly 
if among them they added only t 1 new specifics to the 
hierarchy. Their saturations would be found by placing 
them one at a time in the hierarchy, and then their regres- 
sion on g calculated by Aitken's method. The necessity 
for the hierarchy in the background, in all this, is clear : it 
is there to assure us that each singly conforming test is 
compatible with the definition of g, and to enable its g 
saturation to be calculated. 

8. The danger of" reifying "factors. The orthodox view 
of psychologists trained in the Spearman school is that g is, 
of all the factors of the mind, the most ubiquitous. " All 
abilities involve more or less g," Spearman has said, al- 
though in some the other factors are " so preponderant 
that, for most purposes, the g factor can be neglected." 
With this view, the present author has always agreed, 
provided that g is interpreted as a mathematical entity 
only, and judgment is suspended as to whether it is any- 
thing more than that. 

The suggestion, however, that g is " mental energy," of 
which there is only a limited amount available, but avail- 
able in any direction, and that the other factors are the 
neural machines, is one to be considered with caution. 
The word energy has a definite physical meaning. " Mental 
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energy " may convey the meaning that the energy spoken 
of is the same as physical energy, though devoted to mental 
uses. If that meaning is accepted, innumerable difficulties 
follow, not the least being the insoluble questions of the 
connexion of body and mind, and of freewill versus 
determinism. A less obscure difficulty is that there seems 
to be no easily conceivable way in which the " energy " 
of the whole brain can be used in any direction indifferently, 
except by the " neural engines " also all taking part. The 
energy of a neurone seems to reside in it, and the passage 
of a nerve impulse along a neurone seems to resemble 
rather the burning of a very rapid fuse, than the conduction 
of electricity, say, by a wire. 

If " mental energy " does not mean physical energy at 
all, but is only a term coined by analogy to indicate that 
the mental phenomena take place " as if " there were such 
a thing as mental energy, these objections largely disappear. 
Even in physical or biological science, the things which are 
discussed and which appear to have a very real existence 
to the scientist, such as " energy," " electron," " neutron," 
" gene," are recognized by the really capable experimenter 
as being only manners of speech, easy ways of putting into 
comparatively concrete terms what are really very abstract 
ideas. With the bulk of those studying science there exists 
always the danger that this may be taken too literally, but 
this danger does not justify us in ceasing to use such terms. 
In the same way, if terms like " mental energy " prove to 
be useful, and can be kept in their proper place, they may 
be justified by their utility. The danger of " reifying " 
such terms, or such factors as g, v, etc., is, however, very 
great, as anyone realizes who reads the dissertations 
produced in such profusion by senior students using these 
new factorial methods. 



CHAPTER XVI 

" ORTHOGONAL SIMPLE STRUCTURE " 

1. Simultaneous definition of common factors. In a sense, 
Thurstone's system of multiple common factors is a 
generalization of the original Spearman system which had 
only one. It recognizes that matrices of correlation 
coefficients are not usually reducible to rank 1, but 
that they are usually reducible to a low rank, and it 
replaces the analysis into one common factor and specifics 
by an analysis into several common factors and specifics, 
keeping the number of common factors at a minimum. It 
does not lay the great stress on the ubiquity and domin- 
ance of g which is found in the Spearman system. 

Spearman's system, having defined g as well as possible 
by an extended hierarchy, goes on then to definitions of 
the next most important factors, by similar means. It 
looks upon any complex matrix of correlations as being 
due to lesser hierarchies superimposed upon the g hierarchy. 
Moving in accordance with a very commonly held belief 
which almost certainly has some justification, it has sought 
and found " verbal " and " practical " factors to add to g, 
and is groping for some kind of character or emotional 
factor which would complete the main picture. " One at a 
time " has been its motto. 

Moving along another route, Thurstone has endeavoured 
to define several factors by one matrix of correlations. 
Although the campaign of the Spearman school was 
presumably the only method open to pioneers, a student 
must be struck by the fact that the standard definition 
of g is made by a battery of tests (Brown and Stephenson, 
1933) which is not really reducible to rank 1 until a large 
verbal factor has been removed by mathematical means. 
Just as a battery to define g has to be purified either by 
the actual removal of tests or by the mathematical removal 
of factors before it is suitable as such a definition, so not 

242 
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every battery will define a group of common factors, 
Thurstone batteries, like Spearman's, have to be composed 
of selected tests, and purified if the selection is not com- 
plete ; and batteries which give incompatible analyses are 
conceivable. 

2. Need for rotating the axes. Actually, experimenters 
have first assembled a number of tests which appeared to 
them to be likely to contain only, say, r common factors, 
factors which they have already suspected to exist and 
have tentatively named. They have then ascertained by 
using Thurstonc's approximate communalities whether a 
reduced rank of r can be achieved as a sufficiently close 
approximation, by examining the residues after r factors 
have been " taken out." By analogy with Spearman's 
purification process, they might then remove any tests 
which were preventing this ; but such purification has not 
been very usual though it seems just as justifiable here as 
in a hierarchy. Let us suppose that a battery, assembled 
because it appeared, psychologically, to contain r common 
factors, does give a matrix which can be reduced to 
rank r. 

As was explained towards the end of Chapter II, the 
loadings given by the " centroid " process then include a 
number of negative values, and these the psychologist has 
difficulty in accepting in any large numbers. For it is 
hard for him to conceive of psychological factors which 
help in some tests and hinder in others, except in rare 
cases. The mathematician ran then " rotate " the factor 
axes within the common-factor space (Thurstone's principle 
forbids him to go outside it) in search of a position which 
will satisfy the psychologist. One way of doing this has 
already been sketched in Chapter II, Section 8. It has 
been used with excellent effect by W. P. Alexander 
(Alexander, 1935), but involves assuming (a) that the com- 
munality of a certain test is entirely due to one factor ; 
(b) that the communality of a second test is entirely due 
to this factor and one other ; (c) and so on for r 1 tests, 
where r is the number of factors. The criterion of success 
with this method is to see whether, when these assumptions 
are made, negative loadings disappear ; and whether the 
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consequent loadings of those tests about which no assump- 
tions are made are compatible with the psychologist's 
psychological analysis of them. It cannot be too emphati- 
cally pointed out that the first factors which emerge from 
the " centroid " process and the minimum-rank principle 
need not have psychological significance as unitary primary 
traits. It is only after rotation to a suitable position that 
this can be expected. 

3. Agreement of mathematics and psychology. It becomes 
increasingly clear that the whole process is one by which 
a definition of the primary factors is arrived at by satisfying 
simultaneously certain mathematical principles and certain 
psychological intuitions. When these two sides of the 
process click into agreement, the worker has a sense of 
having made a definite step forward. The two support 
one another. Obviously the goal to be hoped for along this 
line of advance will be the discovery of some mathematical 
process which always leads to a unique set of factors mainly 
acceptable to the psychologist. If such could be dis- 
covered and found to produce a few factors over and above 
those recognized as already known by other means, the 
new factors would stand a good chance of acceptance on 
the strength of their mathematical descent only. And no 
doubt the psychologist would be prepared to make a few 
concessions and changes in his previous ideas to fit in 
with any mathematical scheme which already gave 
much satisfaction and was objective and unique in its 
results. 

It is here that Thurstone's notion of " simple structure " 
is offered as a solution ( Vectors, Chapters 6-8). This idea is 
that the axes are to be rotated until as many as possible of 
them are at right angles to as many as possible of the 
original test vectors ; and that the battery is not suitable 
for defining factors unless such a rotation is uniquely 
possible, a rotation which will leave every axis at right 
angles to at least as many tests as there are factors, and 
every test at right angles to at least one axis. 

When the vectors of a test and a factor are at right 
angles, the loading of the factor in that test is zero. 
Thurstone's " simple structure " is therefore indicated by 
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a large number of zeros in the matrix of loadings, so large 
that there will be only one position of the axes (if any) 
which satisfies the requirement. His search, be it repeated, 
is for a set of conditions which will make the solution 
unique. We have seen him approaching this goal by 
stages. Unless the battery is large, so that 



n > 



(see Chapter II, Section 9), the communalities are not 
unique. Even when the battery is large enough, the axes 
representing factors may be rotated to positions among 
which there is no one specially marked out. Then comes 
the demand that there be this large number of zero loadings. 
Most batteries of tests will not allow this demand to be 
satisfied, but with some it can just be attained. Only 
these last, it is Thurstone's conviction, are suitable for 
defining primary factors, and it is his faith that the factors 
thus mathematically defined will be found to be acceptable 
as psychologically separable unitary traits. 

4. An example of six tests of rank 3. To make our 
remarks more definite and concrete, let us suppose that 
we have a battery of six tests whose matrix of correlations 
can be reduced to rank 3. In practice, of course, six tests 
are far too few, and more than three factors quite likely. 
The matrix of loadings given by the " centroid " system 
contains at first negative quantities. Thus from the 
correlations : 





1 


2 


3 


4 


5 


6 


1 


. 


525 


000 


000 


448 


000 


2 


525 


. 


098 


306 


349 


000 


3 


000 


098 


, 


133 


314 


504 


4 


000 


306 


133 


. 


000 


000 


5 


-448 


349 


314 


000 


. 


307 


6 


000 


000 


504 


000 


307 


. 



with the communalities 

674 -634 -558 -415 -490 -493 
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we get by the " centroid " process the matrix of loadings : 



1 
2 
3 

4 
5 
6 



-542 -612 -074 

629 -342 -348 

529 -492 -191 

281 -182 -550 

628 -143 -274 

429 -424 -359 



It is the factor axes indicated by these loadings that 
Thurstone wishes to rotate until there are no negative 
loadings and enough zero loadings to make the position 
uniquely defined. For this last purpose he finds, empiri- 
cally, that it is necessary to require 

(a) At least one zero loading in each row ; 

(b) At least as many zero loadings in each column as 
there are columns (here three) ; and 

(c) At least as many XO or OX entries in each pair of 
columns as there are columns. By an XO entry is meant 
a loading in the one column opposite a zero in the other. 

" At least one zero loading in each row." This means 
that no test may contain all the common factors. In 
making up the battery, then, the experimenter, with some 
idea in his mind as to what the factors are, will endeavour 
to ensure that they are not all present in any one test. 
This would, for example, exclude from a Thurstone battery 
any very mixed group test, or a mixed test like the Binet- 
Simon which is itself a whole battery of varied items. 

" At least as many zeros in each column as there are 
columns," that is, as there are common factors. This 
means that in a Thurstone battery no factor may be general, 
but must be missing in several tests. 

The requirement as to the number of XO or OX entries 
is intended to ensure that the tests are qualitatively 
distinct from one another. 

Now, these requirements cannot generally be met by a 
matrix of loadings. It will in general be impossible to 
rotate the axes (keeping them orthogonal) until every 
axis is at right angles to r test vectors. The above 



"ORTHOGONAL SIMPLE STRUCTURE 



24? 



example has, however, been constructed so that this can 
be done. 

The correlations were in fact made from the loadings : 

ABC 



1 
2 
3 
4 
5 
6 



718 

-438 
702 



475 
206 
644 



821 
639 



546 



n 



and the centroid loadings must therefore be capable of 
being rotated rigidly into this form, retaining ortho- 
gonality. 

5. Two-by-two rotation. The problem for the experi- 
menter, however, is to discover this " simple structure," if 
it exists ; he is not, like us, in the position of knowing 
that it does exist, and what it is. Thurstone's original 
method was to make a diagram of two 
of the centroid factors, and rotate 
them ; then to make other diagrams 
of two factors at a time, and rotate 
them, in each rotation endeavouring 
to obtain some zero loadings. Let us 
illustrate by our artificial example, 
taking first the centroid factors I and II. 
Using their centroid loadings as co- 
ordinates, we obtain Figure 25, where 
each test is represented by a point, and 
the centroid axes by the co-ordinate axes 
marked I and II. At once we notice 





Figure 25. 



that the test-points 3, 4, and 6 are almost collinear on a 
radius from the origin, and that if we rotate the axes 
clockwise through about 42 the new position of I, 
labelled I t in the diagram, will almost pass through these 
test-points, while the new axis IIx will almost pass through 
test-point 1. On these new axes, therefore, Tests 3, 4, and 
6 will have hardly any projections on axis Hi ; that is, will 
have hardly any loadings in a factor along II x . The 
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new co-ordinates of the test-points on I x and II i could be 
measured on the diagram, and the reader is advised, in 
doing rotations, always to make a diagram and find the 
approximate new loadings thus, as this furnishes an ex- 
cellent check on the arithmetical calculation. 

This arithmetical calculation is based on the fact that 
if a test-point has the co-ordinates x and y with reference 
to the original centroid axes I and II, its new co-ordinates 
on the rotated axes Ii and Hi will be x cos 42 y sin 42 
on I x and x sin 42 + y cos 42 on 11^ From tables we 
find sin 42 = -669, and cos 42 = -743, and the calculation 
can be done readily on a machine, or with Crelle's multipli- 
cation tables. We have then : 





Old loadings 
I II 


New loadings 


1 


542 -612 


007 -817 


2 


629 -342 


239 -675 


3 


529 492 


722 012 


4 


281 -182 


331 -053 


5 


628 -143 


371 -526 


6 


429 424 


602 --028 



multipliers *743 669 for I t loadings, 
669 -743 for II t loadings. 

At this point a check should be made by seeing that the 
sum of the squares of the new pair of loadings is identical 

with the sum of the squares of 
the old pair, for each test. We 
have now obtained our desired 
three zero (or near zero) loadings 
in factor II j. Accepting the ap- 
proximations to zero as good 
enough for the present, we next 
make Figure 26 from the loadings 
of I x and III in the same way as 
we made the former figure. In 
this, Test 1 falls quite near the 
Figure 26. origin. Tests 5 and 6 are ap- 
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proximately on one radius, and Tests 2 and 4 on another, 
and these radii are at right angles to one another. If we 
rotate the axes I x and III rigidly through a clockwise turn 
of about 49 they will pass almost through these radial 
groups and nearly zero projections will result.* Using 
sin 49 = 755 and cos 49 = -656 we perform a similar 
calculation to the preceding, using the loadings I and III 
as starting-point and obtaining loadings on I 2 and lilt (the 
subscript indicating the number of rotations that axis has 
undergone). We have finally, putting our results together, 
the table of loadings FA.f 



1 

2 
3 

4 
5 
6 



060 -817 -043 

420 -675 --048 

329 -012 -670 

632 -053 111 

037 -526 -460 

124 028 -690 



Clearly, this is an approximation to the loadings of the 
factors A 9 B 9 and C which we who are in the secret (as a 
real experimenter is not) know to have been used in making 
the correlations : Illi here is A, I 2 here is B 9 and II! is C. 
The small loadings are not quite zero, and the other load- 
ings not quite the same, but a further set of rotations 
would refine the results and bring them nearer to the 
ABC values. 

6. New rotational method.- When this two-by-two rota- 
tional method is used on a large battery of tests, with 
perhaps six or seven factors instead of three, it is not 
only laborious but somewhat difficult to follow. Thur- 
stone has, however, devised a method of rotation which 
takes the factors three at a time, and to this we now turn, 
still using our small artificial example as illustration. In 

* The rotation might with advantage have been carried a little 
further. 

t The matrix symbols, using Thurstone's notation, are given for 
the convenience of mathematical readers. Others should ignore 
them. 
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this example, since there are only three factors, this new 
method leads to a complete solution at once. With more 
factors the matter would be more complicated. 

If the reader will think of the three centroid factors as 
represented by imaginary lines in the room in which he is 
sitting (Figure 27), he will be aided in following the 
explanation of this newer method. Imagine the first 




m 



Figure 27 (not to scale). 

centroid axis to be vertically in the middle of the room, 
and the other two centroid axes on the carpet, at right 
angles to the first and to each other. The test-points are 
in various positions in the room space, if we take their three 
centroid loadings as co-ordinates and treat the distance from 
floor to ceiling as unity. Imagine each test-point joined 
by a line to the origin (in the middle of the carpet, where 
the axes cross). The lengths of these lines are the square 
roots of the communalities, and the loadings on the first 
centroid factor are their projections on to the vertical axis, 
the height, that is, of each test-point above the floor. 

Thurstone now imagines each of these lines or com- 
munality vectors produced until it hits the ceiling, making 
a pattern of dots on the ceiling. These extended vectors 
now all have unit projection on the first centroid axis, 
for we agreed to call the distance from floor to ceiling 
unity. Their y and z co-ordinates on the ceiling will be 
Correspondingly larger than their loadings on the second 
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and third centroid factors, and can be obtained by dividing 
each row of the centroid loadings by the first loading. In 
our case this gives us the following table, obtained in the 
manner just mentioned from the table on page 246. 



Ext 


ended centroid projections 
I e II e III e 


1 


1-000 


1-129 


137 


2 


99 


544 


553 


3 


99 


930 


361 


4 


99 


648 


-1-957 


5 


99 


228 


436 


6 


99 


988 


837 



He 



The second and third columns are now the co-ordinates 
of those dots on the ceiling of which we spoke. A diagram 
of the ceiling, seen from above, is given in Figure 28, and 
the important point about 
it is that the dots form m a 

a triangle. 

If the reader will now 
picture this triangle as 
drawn on the ceiling of 
his room, and remember 
that the origin, where the 
centroid axes crossed, is in 
the middle of the carpet, 
he can next imagine an 
inverted three-cornered 
pyramid, with the triangle 
on the ceiling as its base, 
the origin in the middle of Figure 28. 

the carpet as its apex and 

the communality vectors 1, 4, and 6 as its edges. The 
vector 5 lies on one of the faces of this pyramid ; vector 
2 lies on another ; vector 3 lies on the remaining face, all 
springing from the origin and going up to the ceiling. 

7. Finding the new axes. If now we choose for new 
axes (in place of the centroid axes) three lines at right 
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angles respectively to the three plane faces of our pyramid, 
the test projections on these axes will clearly have the 
zeros we desire. The three vectors 1, 2, and 4 all lie in 
one face, and will have zero projections on the axis A' 
at right angles to that face. The vectors 1, 5, and 6 will 
have zero projections on the line B' at right angles to their 
face. The vectors 3, 4, and 6 will have zero projections on 
C" at right angles to their face. The reader should 
visualize these new axes in his room. It remains to be 
shown how the other, non-zero, projections are to be 
calculated, and to inquire whether these new axes are 
orthogonal, and whether they can be identified with the 
original A, B, and C, The first step is to obtain the equa- 
tions of the three sides of the triangle in the diagram. 
Where there are many tests and the dots are not perfectly 
collinear, one plan is to draw a line through them by eye, 
and measure the distances a and b it cuts off on the axes, 
then using the equation 



Or we can write down the equations of the lines joining 
points at the corners, either actual test points, or the places 
where our lines intersect, using the equation 

(Iv mu) -\- (m v) y + (u I) z Q 
when /, m are the co-ordinates of one corner, and u, v of 
another. We obtain in our case 

2-121 + 2-094*/ - l-777a = for line 1, 2, 4 

- 1-080 + -700i/ + 2-1172 = 1, 5, 6 

2-476 + 2-794y + -340* = 4, 3, 6 

where y means the extended II, and z the extended III. 

Before we go further we have to divide each equation 

through by the root of the sum of the squares of its 

coefficients, so that the new coefficients sum to unity when 

squared this is called normalizing and is necessary in 

order to keep the communalities right and for other reasons. 

The equations then are : 

-611 + -603*/ -5122 = (1) 

-436 + -2S3y + -854* = (2) 
660 + -745t/ + -091s = (3) 
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and it is clear, from the way in which they have been 
reached, that these equations will be satisfied by the ex- 
tended co-ordinates of certain of the rows in the table on 
page 251. Consider the first equation and write its co- 
efficients above the columns of that table, placing '611 
over the first column, thus : 





611 


603 


512 


Weighted 




X 


y 


z 


sum 


1 


1-000 


1-129 


137 


000 


2 


99 


544 


-553 


000 


3 


99 


-930 


361 


-1-357 


4 


99 


648 


1-957 


000 


5 


99 


228 


436 


697 


6 


99 


-988 


837 


-1-635 



If we multiply each column by the multiplier above it 
and add the rows we get the quantities shown on the right. 
The zeros are in the right places for factor A. The other 
loadings are, however, negative (that can be easily put 
right by changing all the signs of the multipliers, which 
we are at liberty to do) and are too large, because, of course, 
it is the extended loadings which have been used. We must 
multiply each of them by the original first centroid load- 
ing of the row, since at an earlier stage we divided by it. 
If we do so, we find that we get exactly the loadings of 
column A of the table originally used to make the corre- 
lations. Thus : 

1-357 X -529 = -718 
697 X -628 = -438 

1-635 X -429 = -701 

the difference from -702 being due to rounding off only. 

Or more simply, we could have applied our multipliers 
to the centroid loadings themselves, not to the extended 
projections. The zeros will obviously remain, and for the 
other loadings we obtain at once those of factor A. Simi- 
larly, using eqns. (2) and (3) we get the loadings of factors B 
and C exactly, except for an occasional difference due to 
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rounding off at the third decimal place. We have, indeed, 
found the matrix product FA, 



542 -612 -074 

629 -342 --348 

529 492 -191 

281 --182 --550 

628 -143 -274 

429 424 -359 



611 -436 -660 
603 --283 -745 
512 --854 -091 



. -821 

. -475 -639 

718 -206 . 
. -644 . 

438 . -546 

702 . 



except, as has been already said, for occasional dis- 
crepancies in the third decimal place. The procedure we 
have described has enabled us to discover this last matrix, 
with which in fact we began. And by analogy (is the 
deduction sound ?) an experimenter with experimental 
data who follows this procedure and reaches simple 
structure concludes that that is how his correlations were 
made. Certainly that is how they may have been made. 
The matrix A beginning with -611 is the rotating 
matrix which turns the axes I, II, III into the new posi- 
tions A 9 B, C. Its columns are the direction-cosines of 
A, B, and C with reference to the orthogonal system 
I, II, III. Are A 9 B, and C orthogonal ? The cosines of 
the angles between them can by a well-known rule be 
found by premultiplying the rotating matrix by its 
transpose. When we do so we find A'A I, viz. : 



611 603 -512 
436 283 -854 
660 -745 -091 



611 -436 -660 

-603 --283-745 

512 --854 -091 



(again allowing for third decimal place discrepancies). 
That is to say, the angles between A, B, and C have zero 
cosines, they are right angles. 

The axes A, B, and C, were drawn at right angles to the 
three planes which form the pyramid mentioned above, and 
therefore these three planes are also at right angles to one 
another. (Our rough sketch in Fig. 27 made the pyramid 
too acute.) It follows that A, B, and C are actually 
the edges of the pyramid. In our example (though this 
need not be the case) they happen to pass each through a 
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test-point in the room, A through Test 6, B through Test 4, 
and C through Test 1. These tests are not identical with 
the factors, for each test contains a specific element, not in 
the common-factor space, but at right angles to it. What 
we have called a test-point is the end of the test vector 
projected on to the common-factor space. The complete 
test-vectors are out in a space of more dimensions, of which 
the three-dimensional common-factor space is a subspace. 

8. Landahl preliminary rotations. When there are more 
than three centroid factors, the calculations are not so 
simple. If the common-factor space is, for example, 
four-dimensional, then the table of extended vectors, in 
addition to its first column of unities, will have three other 
columns. The two-dimensional ceiling of our room, in our 
former analogy, has here become three-dimensional, a 
hyper-plane at right angles to the first centroid axis. On 
paper its dimensions can only be graphed two at a time, 
and no complete triangle will be visible among the dots. 
But sets of dots will be seen to be collinear, lines can be 
drawn through them, and a procedure similar to that out- 
lined above followed. This will become clearer when we 
work a four-dimensional example. First, however, it is 
desirable to explain, on our simple three-dimensional 
example, a device which facilitates the work on higher 
dimensional problems, called the Landahl rotation. It is 
unnecessary in the three-dimensional case, and we are 
using it only to explain it for use with more than three 
dimensions. 

A Landahl rotation turns the centroid axes solidly 
round until each of them is equally inclined to the original 
first centroid axis. In our imagined room the first cen- 
troid axis ran vertically from the middle of the floor 
to the middle of the ceiling, while the other two were 
drawn on the floor itself. Imagine all three (retaining 
their orthogonality) to be moved, on the origin as pivot, 
until they are equally inclined to the vertical. That is a 
Landahl rotation. The lines through the test-points have 
not moved. They remain where they were, and still hit 
the ceiling in the same pattern of dots. The projections of 
the extended vectors on to the original first centroid 
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axis all still remain unity. But for the next step in this 
method we need their projections on to the Landahl axes. 
We obtain these by post-multiplying the matrix of een- 
troid extended loadings by a Landahl matrix, an orthogonal 

matrix with each element in its first row equal to j- 9 

. . ^ c 

where c is the order of the matrix ; that is, its number of 

rows or columns (Landahl, 1938). We need a Landahl 
matrix of order 3, for example : 



577 -577 -577 
816 --408 -408 
000 -707 707 



The element *577 is the cosine of the angle which each axis 
makes, after rotation, with the original position of the first 
centroid axis. 

When the table of extended vector projections on page 
251 is post-multiplied by the above matrix, the following 
table results, giving the projections of the extended 
vectors on to the Landahl axes L, M, N : 

Projections on Landahl axes 
L M N 



1 


1-498 


213 


020 


2 


1-021 


066 


746 


3 


182 


1-212 


701 


4 


048 


542 


2-225 


5 


760 


792 


176 


6 


229 


1-572 


388 



From this table three diagrams LM 9 LN, and MN can 
be made, and the reader is advised to draw them. Each 
of them shows a triangular distribution of dots and in this 
simple three-dimensional example only one of them is 
needed. But in a multi-dimensional problem several are 
needed, and usually only one line is used on each diagram 
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employed. Here, from the LN diagram we find the 
equations of the three sides of the triangle to be : 

2-2051 l-450n + 3-332 = 

368Z + l-72Tn -586 = 

1-837/ -277n + -528 = 

We want to make these homogeneous in Z, m, and n, and so 
we add, after each of the numerical terms, the factor 
577 (/ + m + n), which equals unity. The equations 
then are : 

282Z + l-923ra + -473n = 

030Z -338m + l-389n = 

2-142Z + -305m + -028n == 

After normalizing these become : 

141Z + -961m + -236n = 

021Z -236m + -971n = 

- -990Z + -141m + -013/1 

Writing the coefficients as columns in a matrix, and 
premultiplying by Landahl's matrix (since at an earlier 
stage we post-multiplied by it) we obtain 

609 -436 -660 

-603 -283 -745 
513 853 -090 

the same matrix A as we arrived at (page 254) without the 
use of Landahl's rotation. The advantage of using a 
Landahl rotation appears only in problems with more 
than three common factors. The reader can readily make 
a Landahl matrix of any required order, say 5. Fill the 
first row with the root reciprocal of 5, '447. Complete 
the first column by putting in the second place -894, 
(because -447 2 + -894 2 = 1) and below that zeros. The 
second row must then be completed with equal elements, 
all negative, such that the row sums to zero. Then the 
second column is completed in a similar way, and the third 
row, and so on. The reader should finish it. There are 
alternative forms possible, one of which is used below. 
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An unfinished Landahl matrix : 



447 -447 -447 
224 224 224 

289 289 289 



9. A four-dimensional example. The following example 
of a problem with four common factors is only partly 
worked out, so that the reader can finish it as an exercise. 
It also is an artificial example, and orthogonal simple 
structure can be arrived at. The centroid analysis gave 
four centroid factors with the loadings shown in this table : 



447 


447 


894 


224 


000 


866 


000 


000 


000 


000 



Centroid loadings F 
I II III 



IV 



1 


727 


517 


094 


126 


2 


575 


105 


553 


049 


3 


810 


289 


246 


246 


4 


588 


417 


367 


-382 


5 


524 


-583 


450 


183 


6 


549 


435 


398 


013 


7 


624 


318 


-187 


-254 


8 


594 


.KK-1 
\J*J JL 


239 


084 


9 


626 


252 


169 


562 


10 


645 


307 


-357 


109 



After these have been " extended " (i.e. divided in each 
row by the first loading) they were post-multiplied by a 
Landahl matrix, one of the alternative forms, viz. : 



5 


5 


5 


5 


5 


5 


5 


5 


5 


5 


5 


.5 


5 


5 


5 


5 



and the resulting projections on the Landahl axes were 
thus found to be : 
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L 


M 


N 


P 


1 


1-007 


704 


122 


166 


2 


1-115 


068 


848 


030 


3 


679 


678 


625 


018 


4 


218 


1-492 


158 


132 


5 


311 


199 


453 


1-660 


6 


455 


247 


1-270 


522 


7 


-107 


598 


808 


701 


8 


308 


235 


1-094 


833 


9 


1-015 


387 


285 


883 


10 


376 


1-099 


070 


454 



Six diagrams can be made, and it is advisable to draw 
them all, though not all are necessary. The LN diagram 
is shown in Figure 29. We scan it for collinear points 
(not necessarily radial) which have all or nearly all the other 
points on one side of their line, and note the line 5, 4, 10, 9. 
Its equation is readily found to be approximately 

738/ + l-327n -371 = 0. 

We make this homogeneous by substituting for unity, after 
the numerical term -371, the quantity -5 (I + ra + n + p), 
for -5 is the cosine of the angle each of the Landahl axes 
makes with the original first centroid axis. This gives us 
the equation 

5532 -185m + M41n -185^ = 0. 



Three more equations are needed, and one of them can 
indeed be obtained from the same diagram, on which 
points 5, 7, 8, 6 are very nearly collinear. The reader is 
advised to draw the remaining diagrams and complete the 
calculations following the steps of our previous example. 
The above equation refers to a line which makes a fairly 
big angle with N. It is desirable to look for the remaining 
three lines making large angles (approaching right angles) 
with L, M 9 and P. 

It will be remembered that in our earlier example the 
sign of one equation had to be changed at the end of the 
calculation because large negative values were appearing 
in the final matrix of loadings. This can be obviated 
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by attending to the following rule. If the other test-points 
are on the same side of the line as the origin the numerical 

term must be positive in the 
N 6 equation ; if they are on the 

* 8 side remote from the origin 

2 the numerical term must be 
3 negative. In the adjacent 

diagram, the origin and the 
^ other points are on oppo- 

x *^ x * L site sides of the line through 

Vx ^x Q 5, 4, 10. 9 and therefore 

x.^ y 

x - v ^ the numerical term must be 
Figure 29. " negative, as it is (371). 

Had it been positive all the 

signs of the equation would have required to be changed. 
10. Ledermanrts method of reaching simple structure. 
Ledormann has pointed out that when simple structure 
can be attained (whether orthogonal or oblique) then as 
many r-rowed principal minors of the reduced correlation 
matrix must vanish as there arc common factors ; and 
that it follows that the same number of vanishing deter- 
minants must be discoverable in the table of centroid 
loadings. Thus, for example, in the table of centroid 
loadings on page 246 the three determinants composed 
respectively of rows 1, 2, and 4 ; of rows 1, 5, and 6 ; and 
of rows 3, 4, and 6 all vanish, and these rows are where the 
zeros come in the three columns of the simple structure. 
This gives an alternative method of reaching simple 
structure. Test every possible r-rowed determinant in 
the centroid table of r factors. If r of them are discovered 
to vanish, then simple structure may be and probably is 
possible. Each of these vanishing determinants will 
provide a column of the rotating matrix A, for which pur- 
pose we delete any one of its rows and calculate all the r 1 
rowed minors from what is left. The column has then to 
be normalized. This process works equally well for 
oblique simple structure (see Chapter XVIII). Its draw- 
back, when the number of factors is large, is the necessity 
of calculating so many determinants to discover those that 
vanish. 
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11. Leading to oblique factors. In this chapter we have 
kept our factors orthogonal ; that is, independent, un- 
correlated with one another. It is natural to desire them 
to be different qualities, and convenient statistically. In 
describing a man, or an occupation, it would seem to be 
both confusing and uneconomical to use factors which, 
as it were, overlapped. Yet in situations where more 
familiar entities are dealt with, we do not hesitate to use 
correlated measures in describing a man. For instance, 
we give a man's height and weight, although these are 
correlated qualities. 

Often, moreover, a battery of tests which will not 
permit simple structure to be reached if orthogonal 
factors are insisted on will nevertheless do so if the factors 
are allowed to sag away a little from strict orthogonality. 
Even as early as in Vectors of Mind, Thurstone expressly 
permitted this. It can clearly be defended on the ground 
that even if the factors were uncorrelated in the whole 
population, they might well be correlated to some extent in 
the sample of people actually tested. I was at one time 
under the impression that this comparatively slight de- 
parture from orthogonality was all that was contemplated 
by Thurstone. But lately he has assured me in correspond- 
ence that he arid his fellow-workers now have the courage 
of their convictions, and permit factors to depart from 
orthogonality as much as is necessary to attain simple 
structure, even if they are then found to be quite highly 
correlated. A chapter on these oblique factors* is there- 
fore necessary (Chapter XVIII), and out of them arise 
Thurstone's " second order factors." First, however, 
there is the chapter on " Limits to the Extent of Factors," 
retained unchanged from the first edition of this book, 
which, except in its last paragraph, deals only with 
orthogonal factors. 

* It must be clearly understood that this obliquity or correlation 
of factors is quite a different matter from the correlation of estimates, 
even of orthogonal factors, due to the excess of factors over tests, 
described on pages 120 to 129, 



CHAPTER XVII 

LIMITS TO THE EXTENT OF FACTORS* 

1. Boundary conditions in general. Before we discuss 
further the question whether a given set of common-factor 
loadings can be rotated into " simple structure " it is 
desirable to consider a wider problem, in itself quite 
unconnected with Thurstone's particular theory of factors ; 
the problem, namely, of drawing conclusions from correla- 
tion coefficients as to what there is in common between 
tests, or other variates. From one correlation coefficient, 
if it is significant in proportion to its standard error, it is 
natural to assume that the variates share some causal 
factor, though that factor may be a very abstract thing. 
But the circumstance that the correlation is not perfect 
shows that other causal factors too are at work. These 
may dilute the correlation in various ways. Some cause 
may be influencing the variate (1) but not the variate (2). 
Or vice versa some cause may be influencing (2) but not (1). 
Or both these things may be happening. Or some cause 
may be helping the one variate, and hindering the other. 
In any case, however, if the two variates are expressed 
as weighted sums of uncorrelated factors 



z z m)! 

one at least of the factors a must be identical with one at 
least of the factors b, in order that any correlation may 
result. 

If we next consider three tests and low correlations (up 
to '5), we find great elasticity in the possible explanations. f 
Suppose all three correlations equal *5. We have, then, 
among innumerable possibilities, two extreme forms of 

* Orthogonal factors, we must now say. 

f Brown and Thomson, page 142 ; Thomson, 19196, Appendix 
J. R. Thompson. 

262 
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explanation possible, one with only one general factor, 
the other with no general factor 

Zl = -707a + -707$! 

# 2 = '707a + -707*2 }>one general factor 

z z = -707 + -707*3 
or 

*. = -7076 + -707c 



= -707c + -707d 

= -7076 + -707d 



no general factor 



So long as the correlations do not average more than 
5,* they can (usually) be imitated without a general 
factor, although one can be used if desired. That is, they 
can be imitated either by a three-factor if we may so 
designate a factor running through three tests or (usually) 
by two-factors running through only two tests, though 
in certain cases this may prove impossible, especially if the 
average correlation is not far below -5. 

As soon, however, as the average correlation rises above 
5,f some use must be made of a three-factor general to 
all three tests, as the reader can readily convince himself 
by trial. In the above example, if we wish to increase 
the correlation of Tests 1 and 2 while using the second 
form of equations, we see that since we have exhausted 
all the variance on the factors fc, c, and d, we can do so 
only by using either b in Test 2, or d in Test 1, and thus 
making it into a three-factor. 

2. The average correlation rule (Thomson, 19366). When 
we have more tests, say n, then we can usually do without 
an n-factor (or general factor) so long as the average corre- 
lation does not exceed (n 2)/(n 1)4 Again, of course, 
an n-factor may be used if desired, but its use is not usually 
compulsory, as it certainly is in some measure as soon as 

* This is an approximate condition. For an exact form, see the 
Mathematical Appendix, paragraph 20. See also later in this 
chapter. 

f See previous footnote. 

j Approximate condition, see previous footnote, and consult 
Appendix. 
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the average correlation rises past this point. Further, if 
the average correlation is still lower, we can in turn, as a 
rule, dispense with (n l)-factors as soon as the average 
sinks below (n 3)/(n 1), and with factors of less extent 
as it sinks still further. To know approximately what is 
the least-extensive kind of factors we can manage with, we 
have to see where the average correlation fits in, in the 
series of fractions 

1 2 3 n ~3 n -2 



n 1 n 1 n 1 n 1 n 1 

As soon as the average correlation uses past (n p)/(nl) 9 
we can no longer have (p 1) zeros in every column of 
the matrix of loadings. Usually (though not necessarily) 
we can manage to have (p 1) zeros at or below that 
point. 

The reason for this rule can be appreciated if we reflect 
that the highest possible correlations we can get with a 
given number of zero loadings will be reached by abolishing 
all factors of less extent. For example, with two-factors 
only, the highest possible correlations between five tests 
will be obtained by a pattern of loadings like this : 

xxxxoooooo 
xoooxxxooo 
oxooxooxxo 
ooxooxoxox 
oooxooxoxx 

If there are to be no specifics, and if we take the case 
where all the correlations are alike (which is in fact the 
maximum correlation possible), we see that the square of 
every loading must be 1/4, or in general l/(n 1). Each 
correlation will therefore be equal to 1/4 or l/(n 1). In 
the series of fractions 

123 

444 

the average correlation just reaches the first, which can be 

tv\ ___ /vj 

considered as - , n being 5 and p being 4. And p 1 

IT 1 
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or three zeros are just possible in each column of 
loadings. 

Again, consider five tests in which we use only three- 
factors. The maximum correlation is given by a pattern 
just like the last one, except that the noughts and crosses 
have to change places. Since there are six loadings, the 
square of every loading must be 1/6, and the pattern 
shows that every correlation is three times this, or 1/2. 
The average correlation, therefore, now reaches the next 
of the above fractions 

123 
444 
and when 

n p __ 2 

n ^1 ~~ 4 

we have p = 3 ; and p 1 or two zeros are just possible 
per column (represented by the crosses in the former 
diagram), as we know is true from the way in which we 
made the correlations. 

It should be noted that the rule works with certainty only 
in one direction. What it asserts to be impossible, is 
impossible. But when it docs not say that a given number 
of zero loadings per column is impossible, it is not certain 
to be possible. The rule is necessary, but not sufficient. 
Usually, however, it is a fairly safe guide, and when it does 
not say the zeros are impossible, they can generally be 
nearly if not quite reached, with the greater ease, of course, 
the more the average correlation falls below the critical 
value. 

It should also be re-emphasized that these considerations 
have, so far, nothing to do with Thurstone's theory. In 
terms of our geometrical analogy, we are here considering 
the whole space (not merely a common-factor space) and 
asking whether orthogonal axes can be found each of 
which is at right angles to some of the test vectors. We 
are at liberty to take as many axes as we like, extending 
the dimensions of our space as we please. 

As an example, consider the set of correlations used in 
the last chapter : 

F.A. 9* 
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1 


2 


3 


4 


5 


6 


1 


9 


525 


000 


000 


448 


000 


2 


525 


. 


098 


306 


349 


000 


3 


000 


098 


. 


133 


314 


504 


4 


000 


306 


133 


. 


000 


000 


5 


448 


349 


314 


000 


. 


307 


6 


000 


000 


504 


000 


307 


, 



The average correlation is -199, and n, the number of 
tests, is 6. The series of critical fractions is therefore 

1234 
5555 

and the average correlation falls just short of the first one, 
for which, since n p = 1, p = 5. This leaves open the 
possibility that we can use factors which have p 1 or 
four zeros in each column of loadings, that is, that we can 
manage with two-factors each linking only two tests. But 
as '199 is so near to 1/5, and as the correlations are far 
from being all alike, we may expect to find this difficult 
or even not quite possible. Trial shows that we can nearly, 
but not quite, manage with two-factors. The following set 
of loadings, for example, while not perhaps the nearest 
approach to success, comes fairly close : 



Factor 

Test 
1 
2 
3 
4 
5 
6 



II III IV 



VI VII VIII IX 



734 -679 
658 . -300 
301 

607 



318 -613 
895 



278 -606 -682 
446 

504 . -477 . -387 
682 -732 



giving correlations : 

1 2 



1 


. 


483 


000 


000 


412 


000 


2 


483 


. 


090 


285 


309 


000 


3 


000 


090 


. 


124 


289 


465 


4 


000 


285 


124 


. 


000 


000 


5 


412 


309 


289 


000 


t 


283 


6 


000 


000 


465 


000 


283 





which average '183 instead of -199. 
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3. The latent-root rule (Thompson, 1929 ; Black, 1929 ; 
Thomson, 19366 ; Ledermann, 1936). A more scientific 
rule for ascertaining how " extensive " * the factors must 
be to explain the correlations is based upon the calculation 
of the largest " latent root " of the matrix of correlations. 
The exact calculation of the largest latent root is a very 
troublesome business, but luckily there are approximations. 
We have already met the term " latent root," in passing, 
in connexion with Hotelling's process.f 

If the largest latent root lies between the integers s and 
(s + 1), then ^-factors are inadequate, and n - s zeros are 
impossible. Like the previous rule, this one is " neces- 
sary," but not " sufficient." It assures us that s-t actors 
are inadequate, but it does not assure us that (s -f~ 1)- 
factors are adequate, though they usually are if the latent 
root is not too near 5 + 1. 

The easiest approximation to the largest latent root is, 
when the correlations are positive 

Sum of the whole matrix, including diagonal elements 

n 

In the case of the above example the whole matrix, 
including unities in the diagonal elements, sums to 11*972, 
so that the approximate largest latent root is 1-995, which 
leaves it just barely possible that two-factors will suffice. 
As we know by trial, they just won't. 

A better approximation is 

Sum of the squares of the column totals 
Sum of the whole matrix 

the diagonal elements being included for both numerator 
and denominator. (This quantity is, in fact, the sum of 
the squares of the first-factor loadings in Thurstone's 
" centroid " process.) 

* Meaning by an " extensive " factor one which has loadings in 
many tests. Thus a two-factor is less " extensive " than a three- 
factor, and so on. 

t See Chapter V, Section 4. 
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In our example we have : 





1-000 


525 


000 


000 


448 


000 






525 


1-000 


098 


306 


349 


000 






000 


098 


1-000 


133 


315 


504 






000 


306 


133 


1-000 


000 


000 






448 


349 


315 


000 


1-000 


308 






000 


000 


504 


000 


308 


1-000 




Totals 


1-973 


2-278 


2-050 


1-439 


2-420 


1-812 


= 11-972 


Squares 


3-893 


5-189 


4-203 


2-071 


5-856 


3-283 


= 24-495 



24-495 
Approximate largest latent root = 2-046 * 

This time the better approximation definitely cuts out the 
possibility that two-factors will suffice. 

4. Application to the common-factor space. All of the 
above applies to factors in general, and the calculations 
are carried out with unity in each diagonal cell. To apply 
these rules to the problem of the attainability of " simple 
structure," we have to adapt them to the common-factor 
space. For this purpose they must be applied either to 
the matrix with correlations " corrected " for communality 
(the best plan), or with certain modifications to the matrix 
with communalities in the diagonal. By "correcting" 
a correlation coefficient for communality is meant dividing 
it by the square root of each of the communalities of the 
two tests concerned. The result is the correlation which 
would ensue if the specifics were abolished. 

In the case of our small example, the correlations 
" corrected " for communality are : 





1 


2 


3 


4 


5 


6 


1 


1-000 


803 


000 


000 


780 


000 


2 


803 


1-000 


165 


597 


626 


000 


3 


000 


165 


1-000 


276 


601 


961 


4 


000 


597 


276 


1-000 


000 


000 


5 


780 


626 


601 


000 


1-000 


626 


6 


000 


000 


961 


000 


626 


1-000 



The average of the corrected coefficients is -362. In the 

* The exact value to three places of decimals calculated by the 
method given in Aitken, 1937&, 284 jfjf., is 2-086. 
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series of fractions with denominator (n 1) 

1234 
5555 

this value '362 is below 2/5, or (n p)/(n 1) where 
n 6 tests. We see, therefore, that p = 4, and that the 
possibility of having^ 1, or three zeros in every column, 
is not denied. This is in agreement with the analysis (an 
orthogonal " simple structure ") arrived at in Chapter XVI, 
page 254 (and see page 247). 

The first approximation to the largest latent root of the 
matrix with correlations " corrected " for communality 
and with unity in each diagonal cell gives 

Sum of whole matrix 

. __ ____ = 2'oi.Z 

n 

and as this is less than 6 3, three zeros are still possible 
in each column. The more accurate approximation to the 
root 

Sum of the squares of the column totals __ 49*2718 
Sum of the whole matrix 16-870 

shows by its nearness to 6 3 that three zeros, if they are 
possible (and we know they are), must just barely be 
possible.* 

Instead of applying the latent-root test to the matrix 
corrected for communality, we can apply it to the matrix 
of ordinary correlations, with the communalities in the 
diagonal cells, but with the following change. Instead 
of comparing the latent root with the series of integers 
1, 2, 3 ... we have to compare it with the sum of 
1, 2, 3 ... communalities, taking these in their order of 
magnitude, largest first (Ledermann). We shall illustrate 

* Exact root is 2-954. It is tempting to surmise that Thurstone's 
search for unique orthogonal simple structure is really a search for 
a matrix, corrected for communality, with an integral largest root, 
equal to n r ; but it must be remembered that the criterion though 
necessary is not sufficient when the number of factors is restricted 
to r. 
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this on the same example. The matrix of ordinary cor- 
relations, with communalities, is : 

674 -525 -000 -000 -448 -000 

525 -634 -098 -306 -349 -000 

000 -098 -558 -133 -314 -504 

000 -306 -133 -415 -000 -000 

448 -349 -314 000 -490 -000 

000 -000 -504 -000 -000 -493 



Sums 1-647 1-912 1-607 -854 1-601 -997 = 8-618 
Squares 2-713 3-656 2-582 -729 2-563 -994 = 13-237 

1 ^*?^T 

Approximate largest root * *" = 1 '536 

o'ulo 

The communalities arranged in order of magnitude and 
summed are : 

123456 

674 -634 -558 -493 -490 -415 
Continued sum -674 1-308 1-866 2-359 2-849 3-264 

The latent root 1 -536 is larger than the second of these 
but less than the third, so the possibility of three zeros 
per column is left open, in agreement with the former tests 
and with the known facts. It would seem from the present 
writer's experience, however, that the test applied to the 
ordinary matrix in this way does not always agree exactly 
with that applied to the matrix with correlations corrected 
for communality, and that the latter is more accurate. 

5. A more stringent test. The above tests only refer to 
the possibility of obtaining the required number of zero 
loadings with orthogonal factors " orthogonal simple struc- 
ture/' Even when orthogonal simple structure cannot be 
reached, it may be possible to attain simple structure with 
oblique factors. 

Moreover, the approximations used for the largest latent 
root above are only valid, in general, when all the correla- 
tions are positive. In view of the fact, however, that few 
psychological correlations are negative this is not a great 
difficulty. 

Further, while these tests show definitely when ortho- 
gonal simple structure cannot be attained, it does not 
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follow with certainty that it can actually be reached when 
the tests are satisfied, though it usually can. 

An exact criterion has been given (Ledermann, 1936), 
and is described in the Appendix, pages 377-8, which 
avoids all the above defects. It requires at present, 
however, a prohibitive amount of calculation. 

In general, simple structure will be attainable with a 
battery of tests only when the battery has been picked 
with that end in view. There is a certain incompatibility 
about Thurstone's demands which makes their fulfilment 
only possible in special circumstances. He wants as few 
common factors as possible to explain the correlations ; 
but he wants these common factors to have no loadings 
in a large number of the tests. This is rather like wanting 
to run a school with as few teachers as possible, but each 
teacher to have a large number of free periods. If we 
begin by reducing the number of common factors to its 
minimum (as Thurstone does), we will generally find that 
the second requirement cannot be fulfilled. It can, how- 
ever, be fulfilled in some cases, and it is exactly these 
cases which Thurstone relies on to define his primary 
factors. It is his faith that factors found in this mathema- 
tical way will turn out to be acceptable to the psychologist 
as psychological entities. 



CHAPTEll XVIII 

OBLIQUE FACTORS, AND CRITICISMS 

1. Pattern and structure. So long as the factors are 
orthogonal, the loadings in the matrix of loadings are also 
the correlations between the factor and the tests, but this 
ceases to be the case when the factors are correlated. The 
word " loading " continues to be used for the coefficients 
such as /, m, and n in equations like 

z = Zoc + w$ + wy 

and the matrix or table of these is called a pattern, while 
the matrix of correlations between tests and factors is 
called a structure. Thus of the two matrices on page 182 
(Chapter XI, Section 6), the upper one is both a pattern 
and a structure, for the factors are orthogonal, whereas 
the lower one is a structure only. From the upper table 
we can say that 



using the correlations of the factors with Test 1 as coeffi- 
cients in a linear equation for that test score. But we 
cannot say from the lower table that 

*/= -51/t + -25/ 2 + -40* 

The correlations here cannot serve as coefficients. 

Moreover, as soon as the factors become oblique, it 
becomes necessary to distinguish between " reference 
vectors " and " primary factors." The reference vectors 
are the positions to which the centroid axes have been 
rotated so that the test-projections on to them include a 
number of zeros. Each reference vector is at right angles 
to a hyperplane containing a number of communality 
vectors. A hyperplane is a space of one dimension less 
than the common-factor space. In our first example in 
Chapter XVI the hyperplanes were ordinary planes, the 

272 
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faces of the three-cornered pyramid there referred to (see 
page 250) and each reference vector was at right angles to 
one of those faces. 

The primary factor corresponding to a given reference 
vector is the line of intersection of all the other hyper- 
planes, excluding, that is, the hyper plane at right angles to 
the reference vector. In our three-dimensional common- 
factor space the primary factor was the edge of the pyra- 
mid where those two faces met, excluding that face to 
which the reference vector was orthogonal. 

Now, when the reference vectors turn out to be at right 
angles to each other, as they did in that example, each 
reference vector is identical with its own primary factor 
(compare page 254 in Chapter XVI). But not when the 
reference vectors turn out to be oblique. In Chapter 
XVI we did not distinguish them, and called their common 
line the " factor." But in this chapter the distinction 
must be kept clearly in mind. It is the primary factors 
Thurstone wants. The reference vectors are only a 
means to an end. 

Thurstone's second method of rotation described in 
Chapter XVI, the method in which the communal ity 
vectors are " extended," and lines drawn on the diagrams 
which are not necessarily radial lines, will not keep the 
axes orthogonal, but seeks for the axes on which a number 
of projections are zero, regardless of whether the resulting 
directions are orthogonal or oblique. In general they will 
be oblique, and the examples worked in Chapter XVI only 
gave orthogonal simple structure because they had been 
devised so as to do so. The test of orthogonality is that 
the matrix of rotation, premultiplied by its transpose, 
gives the unit matrix (see page 254). Or in other words, 
that the inner products of the columns of the rotating 
matrix are all zero. They are the cosines of the angles 
between the reference vectors, and the cosine of 90 is 
zero. 

2. Three oblique factors. To illustrate Thurstone's 
method when the resulting factors are oblique we shall 
next work an example devised to give three oblique 
common factors. Consider this matrix of correlations : 
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1 


2 


3 


4 


5 


6 


7 


1 




728 


167 


372 


153 


105 


126 


2 


728 




696 


583 


651 


347 


638 


3 


167 


696 




857 


775 


709 


740 


4 


372 


583 


857 




543 


797 


473 


5 


153 


651 


775 


543 




504 


828 


6 


105 


347 


709 


797 


504 




433 


7 


126 


638 


740 


473 


828 


433 





which, with guessed communalities, gives these centroid 
loadings : 

F 



1 


449 


682 


165 


2 


825 


478 


129 


3 


906 


336 


020 


4 


846 


133 


457 


5 


808 


208 


412 


6 


697 


336 


335 


7 


767 


173 


468 



When these projections on the eentroid axes are " ex- 
tended," that is, when each row is divided by the first 
loading in that row, we obtain this table : 



in. 



I 


1-000 


-1-519 


367 


,2 


99 


579 


156 


3 


99 


371 


022 


4 


99 


157 


540 


5 


99 


257 


510 


6 


99 


482 


481 


7 


99 


226 


610 



The columns II e and III e in this table represent the co- 
ordinates of the " dots on the ceiling " in our analogy of 
Chapter XVI, p. 250. When we make a diagram of them we 
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obtain Figure 30. We see that a triangular formation is 
present, and we draw the dotted lines shown. 

It is not essential, it may be remarked in passing, that 
there be no points else- 
where than on the lines, 
provided they are addi- 
tional to those required to 
fix the simple structure. 
Had it not been for the 
desirability of keeping the 
example small we would 
have increased the number 
of tests, and not only 
arranged for further 
points to fall on these 
lines, but also included 
some whose dots fell in- 
side the triangle, repre- 
senting tests which involve all three factors. 

We find the equations of these lines to be approximately 

475 -f -50j/ + -95s = (line 1, 2, 7) 
1-113 + -183y - 2-119^ == (line 1, 4, 6) 
403 - l-091t/ + -256* = (line 7, 5, 3, 6) 

The coefficients of each equation have to be " norma- 
lized," that is, reduced proportionately so that the sum of 
their squares is unity (for they are to be direction cosines). 
These normalized coefficients are then written as columns 
in a matrix as follows : 



Figure 30. 



405 -464 -338 
426 -076 916 
809 883 -215 



= A 



The table of centroid loadings on page 274 must now be 
post-multiplied by this rotating matrix to obtain the 
projections of the tests on the three reference vectors which 
are at right angles to the planes defined by the dotted lines 
in our diagram. We obtain this table : 
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V-FA 
(Simple) Structure on the Reference Vectors 





L' 


B f 


D' 


1 


025 


Oil 


812 


2 


026 


460 


689 


3 


526 


428 


003 


4 


769 


001 


262 


5 


083 


755 


006 


6 


696 


053 


000 


7 


006 


782 


000 



We have labelled the columns L', B', and Z)' for a reason 
which will become apparent later, when wo explain how 
the correlations were, in fact, made. This table is a simple 
structure, formed by the projections on the reference 
vectors. It has a zero (or near-zero) in each row, and 
three or more in each column, in the positions to be 
anticipated from Figure 30 ; for example, tests 3, 5, 6, 
and 7, which are collinear in the figure, have zeros in 
column D'. 

Now let us test the angles between the reference vectors. 
To do this we premultiply the rotating matrix by its 
transpose 

A'A = C 



405 -426 -809 
404 -076 -883 
338 --916 -215 



405 -4G4 -338 
426 -076 --91 6 
SCO -883 -215 



1 - -494 - -079 

-.494 i --103 
079 -103 1 



This gives the cosines of the angles between the reference 
vectors and we see that they are obtuse. The angles are 
approximately : 





120 


95 


120 


t 


96 


95 


96 
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As soon as we know that the reference vectors are not 
orthogonal, we have to take account of the fact that the 
primary factors are not identical with them. Each prim- 
ary factor is the line in which the hyperplanes intersect, 
excluding that hyperplane to which the corresponding 
reference vector is orthogonal. In a three-dimensional 
common-factor space like ours the primary factors lie 
along the edges of the pyramid which the extended vectors 
form. 

Let us return to our mental picture, which the reader 
can place in the room in which he is sitting. The origin, 
immediately below the point in Figure 30, is in the middle 
of the carpet. Figure 30 itself is on the ceiling, seen from 
above as though translucent. The radial lines with 
arrowheads are the projections of the primary factors on 
to the ceiling. The projections of the reference vectors 
are not drawn, to avoid confusion in the figure. They 
are near, but not identical with, the primary factors. 

The reader should not be misled by the fact that two of 
the primary factors lie along the same lines as Tests 1 
and 7. It was necessary to allow this in devising an ex- 
ample with very few tests in it (to avoid much calculation 
and printing large tables). But with a large number of 
tests the lines of the triangle could have been defined 
without any test being actually at a corner. 

3. Primary factors and reference vectors. At about this 
stage a disturbing thought may have occurred to the 
reader. We have sought for, and obtained, simple 
structure on the reference vectors. That is to say, we 
have found three vectors, three imaginary tests, which are 
uncorrolated each with a group of the actual tests, namely 
where there are zeros in the table on page 276. The entries 
in that table are the projections of the actual tests on the 
reference vectors. 

But the primary factors are different from the reference 
vectors. The projections of the tests on to the primary 
factors will be different and will not show these zeros. 
Those projections are, in fact, given in this table (never 
mind for the moment how it is arrived at) : 
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Structure on the Primary Factors 
LED 



1 
2 
3 

4 
5 
6 



160 -162 -832 

408 -666 -793 

866 -809 -176 

934 -495 -401 

541 -927 -152 

842 -472 -132 

468 -915 -150 



These numbers are the correlations between the primary 
factors and the tests, and none is zero. The primary 
factor structure is not " simple," it is the reference vector 
structure that is simple. Why then not use the reference 
vectors as our factors ? 

A two-fold answer can be given to this, one general, the 
other particular to this example. The latter will become 
clear when we divulge how the example was made. The 
former requires us to return to the distinction between 
structure and pattern. A structure is a table of correlations, 
a pattern is a table of coefficients in a " specification " 
equation specifying how a test score is made up by factors. 
The entries in a pattern are loadings or saturations of the 
tests with the factors, but not correlations. 

Pattern and structure are only identical when the 
reference vectors are orthogonal and coincide with the 
primary factors. When the reference vectors are oblique 
(usually at obtuse angles) the primary factors are different 
and are themselves usually at acute angles. When the 
primary factors and reference vectors thus separate, the 
structure of the reference vectors and the pattern of the primary 
factors are identical except for a coefficient multiplying 
each column ; and vice versa the structure of the primary 
factors is identical (except for similar coefficients) with the 
pattern of the reference vectors. In particular, where 
there are zeros in the reference vector structure there will 
also be zeros in the primary factor pattern. The general 
theorem of the reciprocity of reference vectors and primary 
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factors (to use our present terms), that is, the reciprocity 
of (a) a set of lines orthogonal to hyperplanes, and (6) 
another set of lines which are the intersections in each case 
of the remaining hyperplanes, is an instance of the reci- 
procity which runs through the whole of n-dimensional 
geometry between hyperplanes of k dimensions and of 
(n k) dimensions. It occurs in several other places in the 
geometry of factorial analysis : for instance, tests, persons, 
and factors are all in one sense reciprocal and exchangeable. 
The particular fact about the zeros in the primary factor 
pattern can be seen readily from the geometrical analogy. 
For a test vector which lies in a hyperplane can be com- 
pletely defined as a weighted resultant of the primary 
factors which are also in that hyperplane, without any 
assistance from other primary factors. In our drawing 
of the reader's study, for example, on page 250, the vector 
of the Test 2, which is the line 02, lies upon the plane 
face 014 of the pyramid, and can be completely described 
by a weighted sum of the primary factors along the edges 
01 and 04, without bringing in the edge 06 at all. The 
primary factor which lies along that edge will therefore 
have a zero weight in the row of the pattern which speci- 
fies Test 2. This pattern on the primary factors will be 
very similar to the structure on the reference vectors 
already given for our example in the table on page 276. 
It can, in fact, be calculated from that table by multiplying 
the first column by 1-163, the second column by 1-166, 
and the third by 1-017, giving the following : 

FAD' 1 

(Simple) Pattern on the Primary Factors 
LED 



1 
2 



4 
5 
6 

7 



029 -013 -826 

030 -536 -701 

612 -499 -003 

895 -001 -266 

096 -880 -006 

809 -062 -000 

008 -912 -000 
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Thus although the primary factors differ from the 
reference vectors (the angles between the primary factors 
and their corresponding reference vectors are, in fact, 31, 
31, and 11), yet if the structure on the reference vectors 
is " simple," the pattern on the primary factors will be 
" simple." The entries in the above table can be used as 
coefficients in specification equations, and if for clearness 
we omit the near-zero coefficients entirely, we have found 
that the test-scores can be considered as made up thus : 

Score in Test 1 -826d + Specific 

2 - -5366 + -701d + 
3 - -612Z + -4996 + 
4 = -8951 + -266d + 
5 - -8806 + 

6 - -809Z + 

99 99 99 7 = '9120 -f- ,, 

4. Behind the scenes. It is now time to divulge what 
these " tests " really are and how the " scores " were 
made whose correlations we have been analysing, and to 
compare our analysis with the reality. The example is a 
simpler and shorter variety of a device used by Thurstone 
and published in April 1940 in the Psychological Bulletin. 
The measurements behind the correlations were not made 
on a number of persons, but were made on a number of 
boxes only eight boxes, to keep down the amount of 
calculation and printing. These boxes were of the follow- 
ing dimensions : 





Length 


Breadth 


Dep 


1 


2 


2 


1 


2 


3 


2 


3 


3 


3 


2 


2 


4 


6 


3 


2 


5 


4 


4 


2 


6 


5 


3 


1 


7 


5 


4 


3 


8 


4 


4 


2 


Sum 


32 


24 


16 


Mean 


4 


3 


2 
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The " tests " were seven functions of these dimensions, 
and are shown in the next table, which also shows the 
score each box (or " person ") would achieve in that test. 
It is as though someone was unable for some reason to 
measure the primary length, breadth, and depth of these 
boxes (as we are unable to measure the primary factors 
of the mind directly) but was able to measure these more 
complex quantities like LB, or \/(L 2 ~\- D 2 ) (as we are 
able to measure scores in complex tests) : 

Boxes = Persons 



Test 


Formula 


1 


2 


3 


4 


5 


6 


7 


8 


Sum 


Mean 


1 


D 2 


1 


9 


4 


4 


4 


1 


9 


4 


36 


4-500 


2 


ED 


2 


6 


4 


6 


8 


3 


12 


8 


49 


6-125 


3 


LB 


4 


6 


6 


18 


16 


15 


20 


16 


101 


12-625 


4 


V(& + D*) 


2-244-243 


61 


6-32 4-47 


5-10 5-83 4-47 


36-28 


4-535 


5 


L-\-B* 


6 


7 


7 


15 


20 


14 


21 


20 


110 


13-750 


6 


L 2 + D 


5 


12 11 


38 


18 


26 


28 


18 


156 


19-500 


7 


B 


2 


2 


2 


3 


4 


3 


4 


4 


24 


3-000 



With these scores the sums of squares and products of 
deviations from the mean are : 

12 345 67 



1 


66 




50-5 


22 


5 


10-2 


25 




29 




3 


2 


50 


5 


72-9 


98 


4 


16-8 


112 


3 


100 


5 


16 


3 


22 


5 


98-4 


273 


9 


47-9 


259 


2 


398 


-5 


36 


4 


10 


2 


16-8 


47 


9 


11-4 


37 





91 


3 


4-7 


5 


25 




112-3 


259 


2 


37-0 


283 


5 


288 




41 


6 


29 




100-5 


398 


5 


91-3 


288 




800 




36 


7 


3 




16 


36 




4-7 


41 




36 




6 



From these the correlations could be calculated by dividing 
each row and column by the square root of the diagonal 
cell entry. But that would make no allowance for specific 
factors, which in all actual psychological tests play a 
considerable part. In the example devised by Thurstone 
on which this is modelled there are no specific factors, but 
it was decided to introduce them here into tests 5, 6, and 7, 
by increasing their sums of squares. In addition, by an 
arithmetical slip, a small group factor was added to these 
three tests, and this was not discovered for some time. It 
was decided to leave it, for in a way it makes the example 
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more realistic, and may be taken to represent an experi- 
mental error of some sort running through these three tests. 
With these changes, the correlations are found, and are* 
those with which we began this chapter and which we have 
already analysed into three oblique factors L, B 9 and D. 
Let us now compare that analysis with the formulae which 
we now know to represent the tests. The pattern on 
page 2 79, for example, shows that Test 2 depends only on 
factors B and D : and that is correct, for it was, in fact, 
their product J?D, and L did not enter into it. The 
analysis gives the test score as a linear function of B and D, 

536& + -701d 

whereas it was really a product. But the analysis was 
correct in omitting L. Similarly, the analyses into the 
other factors can be compared with the actual formulae, 
and in almost every case the factorial analysis, except for 
being linear, is in agreement with the actual facts. Tests 5 
and 6, true, appear in the analysis to omit factors L and D 
respectively, although these dimensions figured in their 
formulae. But it would appear that they were swamped 
by reason of the other dimension in the formulae being 
squared ; and also possibly the specific and error factors 
we added did something towards obscuring smaller details. 
Also the process of " guessing " communalities, though 
innocuous in a battery of many tests, is a source of con- 
siderable inaccuracy when, as here, the tests are few. 

5. Box dimensions as factors. We can now explain the 
particular reason for selecting the primary factors, and not 
the reference vectors, as our fundamental entities. The 
fundamental entities in the present example can reason- 
ably be said to be the length, breadth, and depth of the 
boxes, given in the table on page 280. Now, the columns 
of that table are correlated with one another, as the reader 
can readily check, the correlation coefficients being 

L with B, -589 
L D, -144 
B D, -204 

These correlations are due to the fact that a long box 
naturally tends to be large in all its dimensions. It could, 
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of course, be very, very shallow, but usually it is deep and 
broad, 

The reference vectors were, it is true, correlated, but 
negatively. They were at obtuse angles with one another 
(see page 276) and obtuse angles have negative cosines 
corresponding to negative correlations. So the reference 
vectors do not correspond to the fundamental dimensions 
length, breadth, and depth. 

What, then, are the angles and hence the correlations 
between the primary factors ? We shall find that they 
are acute angles, and their cosines agree reasonably well 
with the above correlations between the length, breadth, 
and depth. The algebraic method of finding these angles 
is given in the mathematical appendix, but it is perhaps 
desirable to give a less technical account of it here. We 
need the direction-cosines of the primary factors, that is, 
the cosines of the angles they make with the orthogonal 
centroid axes. Each primary factor is the intersection 
of n 1 hyperplanes in our simple case is the intersection 
of two planes. 

In n-dimensional geometry a linear equation defines a 
hyperplane of n 1 dimensions. For example, in a plane 
of two dimensions a linear equation is a line (of one dimen- 
sion) hence the name linear. But in a space of three 
dimensions a " linear " equation like ax + by + cz = d 
is a plane. Two such equations define the line which is 
the intersection of two planes. 

Now, the equations of the three planes which form the 
triangular pyramid of which we have previously spoken 
are just those equations we have already obtained and 
used in our example, viz. : 

405o? + -426i/- + -809^ = 
4640 + -076*/ -8882 = 
338o? -916i/ + -215* = 

These equations taken two at a time define the three 
edges of the pyramid, which are our primary factors, and 
if we express each pair in the form 



a 
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then the direction cosines are proportional to a, b, and c, 
which only require normalizing to be the direction cosines. 
When the direction cosines are found in this way, and 
written in columns to form a matrix, they prove to have 
the values 



797 
400 
453 



-835 -503 
-187 843 
517 -192 



This is the rotating matrix to obtain the projections, i.e. 
the structure, on the primary factors, and if the centroid 
loadings on page 274 arc post-multiplied by this there 
results the table we have already quoted on page 278. 

The above matrix, premultiplied by its transpose, gives 
the cosines of the angles between the primary factors. We 
obtain 



1 

506 
150 



506 

1 
164 



150 

164 

1 



- DC-'D 



Compare these with the correlations between the columns 
of dimensions of the boxes, viz. : 



1 

589 
144 



589 

1 
204 



144 

204 

1 



The resemblance is quite good, and shows that it is the 
primary factors, and not the reference vectors, which 
represent those fundamental although correlated dimen- 
sions of length, breadth, and depth in the boxes. 

6. Criticisms of simple structure. Thurstone's argument 
is then, of course, that as this process of analysis leads to 
fundamental real entities in the case of the boxes (and 
also in his " trapezium " example, Thurstone, 1944, 
p. 84, with four oblique factors), it may be presumed to 
give us fundamental entities when it is applied to mental 
measurements. And I confess that the argument is very 
strong. 
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My fears or doubts arise from the possibility that the 
argument cannot legitimately be reversed in this way. 
There is no doubt that if artificial test scores are made up 
with a certain number of common factors, simple structure 
(oblique if necessary) can be reached and the factors 
identified. But are there other ways in which the test 
scores could have been made ? Spearman's argument was 
a similar reversal. If test scores are made with only one 
common factor, then zero tetrad-differences result. But 
zero tetrad-differences can be approached as closely as we 
like by samples of a large number of small factors, with 
very few indeed common to all the tests. 

However, Thurstone's simple structure is a much more 
complex phenomenon than Spearman's hierarchical order, 
and yet he seems to have had no great difficulty in finding 
batteries of tests which give simple structure to a reason- 
able approximation. I am not sceptical, merely cautious, 
and admittedly much impressed by Thurstone's ability 
both in the mathematical treatment and in the devising 
of experiments. Moreover, his idea of " second-order 
factors," to which we turn in the next chapter, promises a 
reconciliation of Spearman's idea of g, Thurstone's primary 
factors, and (so he tells us in a recent article) my own idea 
of sampling the " bonds " of the mind. 

Thurstone might, I think, put his case in this way. He 
assembles a battery of tests which to his psychological 
intuition appear to contain such and such psychological 
factors, some being memory tests, some numerical, etc., 
etc.. no test, however, containing (to his mind) all these 
expected factors. He then submits their correlations to 
his calculations, reaches oblique simple structure, and 
compares this analysis with his psychological expectation. 
If there is agreement, he feels confirmed both in his psy- 
chology and in the efficacy of his method of finding factors 
mathematically. Usually there will not be complete 
agreement, and he is led to modify his psychological ideas 
somewhat, in a certain direction. To test the truth of these 
further ideas he again makes and analyses a battery. 
Especially he looks to see if the same factors turn up in 
various batteries. He uses his analyses as guides to 
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modifications of his psychological hypotheses, or as con- 
firmation of them. In Great Britain Thurstone's hypo- 
thesis of simple structure has been, I think it is correct to 
say, rather ignored than criticized. The preoccupation of 
most British psychologists since 1939 with the tasks the 
war has brought is partly to blame for this neglect, and 
partly also the fact that most of them have imbibed during 
their education a belief in and a partiality for u Spear- 
man's g," a factor apparently abolished by Thurstone. 
Since his work on second-order factors rehabilitates g, 
this objection should disappear, and his method be at least 
accepted as a device a very powerful device for arriving 
if desired at g and factors orthogonal to it, indirectly. He 
himself thinks that the oblique first-order factors are more 
real and more invariant. 

An early form of response to his work was to show that 
his batteries could also be analysed after Spearman's 
fashion. Holzinger and Harman (1938), using the Bifactor 
method, reanalysed the data of Thurstone's Primary 
Mental Abilities and found an important general factor due, 
as they truly say, " to our hypothesis of its existence and 
the essentially positive correlations throughout." Spear- 
man (1939) in a paper entitled Thurstone's Work 
Reworked reached much the same analysis, and raised 
certain practical or experimental objections, claiming that 
his g had merely been submerged in a sea of error. But 
there is more in it than that. As I said in my contribution 
to the Reading University Symposium (1939) Thurstone 
could correct all the blemishes pointed out by Spearman 
and would still be able to attain simple structure. I said 
on that occasion that however juries in America and in 
Britain might differ at present, the larger jury of the future 
would decide by noting whether Spearman's or Thurstone's 
system had proved most useful in the hands of the prac- 
tising psychologist. I now think that they will certainly 
also consider which set of factors has proved most invariant 
and most real. Very likely the two criteria may lead to 
the same verdict. But for the present the two rival claims 
are in the position described by the Scottish legal phrase, 
44 taken ad avizandum " ; and perhaps before the judge 



CRITICISMS 287 

returns to the bench the matter may have been settled out 
of court by Thurstone's reconciliation via " second-order " 
factors. 

7. Reyburn and Taylor's method. These South African 
psychologists have proposed to let psychological insight 
alone guide the rotations to which axes are subjected, 
and have criticized simple structure (see especially their 
paper 1943a), for lack of objectivity, failure to produce 
in variance under change of tests, and on the grounds that 
at best it yields the factors that were put in. They agree 
that harmony, within the limits of error, between psycho- 
logical hypothesis and the mathematical simple structure 
to a certain extent confirms both, but they criticize especi- 
ally those who assume that even in complete previous 
ignorance of the factors we are entitled to assign objectivity 
and psychological meaning to those indicated by simple 
structure. And anyhow, they urge, simple structure is 
now, when obliquity is permitted, all too easily reached to 
prove anything. They themselves do not necessarily 
insist on a g (see their 1941#, pages 253, 254, 258, etc.). 
Their own plan is to choose a group of tests which their 
psychological knowledge, and a study of all that is pre- 
viously known, leads them to consider to be clustered 
round a factor. They therefore cause one of their axes 
to pass through the centroid of this cluster, keeping all 
axes orthogonal. This factor axis they do not subse- 
quently move. They then formulate a hypothesis about 
a second factor and select a second group of tests, through 
whose centroid (retaining orthogonality) they pass their 
second factor axis. And so on. There is some affinity 
between this and Alexander's method of rotation (see 
page 36). 

The arithmetical details of their method are as follows. 
They first obtain a table of centroid loadings in the usual 
way. Then, having chosen a group of tests which they 
think form, psychologically, a cluster, they add together 
the rows of the centroid table which refer to those tests, 
thus obtaining numbers proportional to the loadings of 
their centroid. These, after being normalized, form the 
first column of their rotating matrix. For example, 



288 THE FACTORIAL ANALYSIS OF HUMAN ABILITY 
consider this (imaginary and invented) table of loadings 





Loadings 






1 


II III 


A 2 


1 


4 


3 


1 


-26 


2 


5 


.g 


6 


70 


3 


6 


-3 - 


3 


54 


4 


5 


2 


1 


30 


5 


4 


-4 


2 


36 


6 


5 


.4 


2 


45 


7 


5 


2 


1 


30 


8 


7 


-4 


1 


66 


9 


7 


.0 


3 


62 


10 


6 


.4 


4 


68 



Reyburn and Taylor now decide, let us suppose, that 
Tests 9 and 10 are, in their psychological view, very 
strongly impregnated with a verbal factor, and they decide 
to rotate their original factors until one of them passes 
through the centroid of these two tests. They extract 
their rows, add them together, and normalize the three 
totals thus : 



(9) -7 
(10) -6 


2 

4 


3 
4 


1-3 
816 


6 
376 


7 
439 



Sum of squares 2-54 = 1-594 2 
obtained by dividing by 1*594. 

If the columns of the original table are multiplied by these 
three numbers and the rows added, the result is the first 
column of the rotated factor loadings in the table below. 
To get the other two columns we must complete the rotating 
matrix in such a manner that the axes remain orthogonal. 
How this is done will be explained separately later. 
Meanwhile, consider the matrix 



816 -399 -417 

376 183 -909 
439 898 
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Its first column is composed of the above numbers. It is 
orthogonal, for the sum of the squares of any row or column 
is unity, and the inner product of any two is zero. When 
the original table of loadings is post-multiplied by this we 
get the rotated table : 



1 
2 
3 
4 
5 
6 
7 
8 
9 
10 



Rotated Loadings 

258 -015 -440 

257 -793 064 

47] -564 -022 

37? -073 -390 

088 -266 -530 

646 -093 155 

289 -253 -390 

465 -116 -656 

778 -047 -110 

816 047 -113 



260 
699 
540 
300 
359 
450 
300 
660 
620 
681 



At this point two checks must be made. The sum of the 
squares (h z ) of each row must still be the same : and the 
inner product of each pair of rows must still be the same 
(it is sufficient to test consecutive rows only). For 
example, in the original matrix the inner product of rows 
7 and 8 was 

5 X -7 + -2 X -4 -1 X -1 = -42 
and in the rotated matrix it was 

289 X -465 + -253 X -116 + -390 X -656 = -420. 

The first factor now goes through the centroid of Tests 
9 and 10, and we scan the loadings it has in the other 
tests to see if these are consistent with their psychological 
nature. For instance, Test 5 has practically no loading on 
this verbal factor is this consistent with our psychological 
opinion of this test ? 

If this scrutiny is satisfactory, the psychologist using 
this method then proceeds to consider where he will place 
his second factor ; for the second and third columns of the 
above loadings have still no necessary psychological mean- 
ing as they stand. Exactly the same procedure is carried 

F.A S 10 
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out with them, the first column being left unaltered. 
Suppose the psychologist decided on Tests 5, 7, 8 as being 
a cluster round (say) a numerical factor. He adds their 
rows 

(5) -266 -530 

(7) -253 -390 

(8) -116 -656 

635 1-576 

374 -928 when normalized 

and uses their normalized totals as the first column of a 
matrix to rotate these last two columns. The matrix 
must be orthogonal, and it is in fact 



r 



374 -928 
928 -374 



When the second and third columns are rotated by post- 
multiplication by this, the final result is : 

Final Rotated Loadings 

1 -258 -414 151 

2 -257 -237 -760 

3 -471 -191 -532 

4 -377 -389 078 

5 -088 -591 -049 

6 -646 109 -144 

7 -289 -457 -089 

8 -465 -652 138 

9 -778 -120 -002 
10 -816 -122 -001 

(The same checks must now be repeated.) The psycho- 
logist now scans column two to see if the loadings of his 
numerical factor agree reasonably with his idea of each 
test, and is rather sorry to see two negative loadings, but 
consoles himself by thinking that they are small. He 
must finally try to name his third factor, present to an 
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appreciable extent only in tests 2 and 3. If he thinks he 
recognizes it, he is content. 

8. Special orthogonal matrices. To carry out the 
above process the reader needs to have at his disposal 
orthogonal matrices of various sizes, such that he can give 
the first column any desired values. The following will 
serve his purpose. Except for the first one, they are not 
unique, and alternatives can be made. 



Order 2 



Order 3 



u v 

V U 



mq mp 
Iq lp 
p ~q 



+ v 2 = 1 



m 



= 1 



It was from this formula that the matrix used in the last 
section, with first column of '816, 376, -439, was made. 
For if we set 

p = -439 

we have q = -898 

and from mq = -816 

we have m = -909 

and thence I = -417 



Order 4. 



a 


b 


c 


-d 


b 


a 


d 


c 


c 


d 


a 


b 


d 


f\ 


-b 


a 



This one was used by Reyburn and Taylor in their 1939 
article (page 159). 

Similar matrices of higher order can be made by a 
recipe given by them, viz. multiplying together two or 
more of the above, suitably extended by ones and zeros. 
For example, a matrix, orthogonal and with arbitrary first 
column, of order 5, can be made by multiplying together : 





mq 


-Iq 


P 




mp 


Ip 


~-q 


X 


I 


m 
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fa c 



where I* + m 2 = p* + q* = X 2 + ^ = ic 2 + cp 2 = 1. 

9. Identity of oblique factors after univariate selection. 
Thurstone, in his recent book Multiple Factor Analysis 
(1947), discusses in Chapter XIX the effects of selection, 
and shows by examples that if a battery of tests yields 
simple structure with oblique factors (including, of course, 
the orthogonal case), then after univariate selection the 
same factors are identified by the new structure, which is 
still simple. 

If, for example, the battery which gives the correlations 
on our page 245, and yields Figure 28 on page 251, has the 
standard deviation of Test 2 reduced to one-half, then by 
the methods described on our pages 171-6 we can calculate 
that the matrix of correlations and communalities becomes : 



1 


589 


295 


-044 


-140 


366 


000 


2 


295 


302 


049 


159 


183 


000 


3 


- -044 


049 


555 


115 


304 


506 


4 


-140 


159 


115 


371 


-087 


000 


5 


366 


183 


304 


-087 


439 


322 


6 


000 


000 


506 


000 


322 


493 



The rank of this matrix is still 3 as it was before selection, 
and three centroid factors are found to have loadings 

I II III 



1 


409 


647 


058 


2 


379 


244 


-315 


3 


569 


-444 


184 


4 


160 


-271 


- -522 


5 


585 


174 


257 


6 


506 


-350 


337 
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When these are " extended " in the manner of our page 251 
and a diagram like Figure 28 made, we obtain Figure 81. 
It is still a triangle, and although its measurements are 
different, the same tests 
are found defining each 
side as before. The cor- 
ners of the triangle may, 
with Professor Thurstone, 
reasonably be claimed to 
represent the same fac- 
tors as before selection, 
although their correla- 
tions have changed. 

The plane of Figure 31 
is not the same as the 
plane of Figure 28, being 
at right angles to a differ- 
ent first centroid. When 
adjustment is made for 
this, as Professor Thur- 
stone has presumably done in his chapter (though, I 
protest, without sufficient explanation), then the directly 
selected test point has not moved, while the other points 
have moved radially away from or towards it. 

If the above matrix of centroid loadings is postmulti- 
plicd by the rotating matrix obtained from the diagram, 
viz. 

| -721 -443 -641 
-499 -201 -744 
480 -874 -190 

we obtain the new simple structure on the reference vectors, 




Figure 31. 



I 



1 

2 
3 

4 
5 
6 



562 

459 
702 



B 



394 
180 

472 



732 

484 



455 
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If this is compared with the table on page 247 it will be 
seen that the zeros are in the same places, although the 
non-zero entries have altered (except in Test 6, which was 
uncorrelated with the directly selected Test 2, and therefore 
is unaffected in composition). 

If the correlations between the factors are calculated by 
the method of pages 283-4, factor A is found to be still 
uncorrelated with B and C, but these last two have a 
correlation coefficient of *3 : that is, they are no longer 
orthogonal but at an obtuse angle of about 107|. 

10. Multivariate Selection and Simple Structure. But 
though Thurstone must, I think, be granted his claim that 
univariate selection will not destroy the identity of his 
oblique factors, but only change their intcrcorrelations, the 
situation would seem to be very different with multivariate 
selection. 

Multivariate selection is not the same thing as repeated 
univariate selection. The latter will not change the rank 
of the correlation matrix with suitable communalities, nor 
will it change the position of zero loadings in simple struc- 
ture. Repeated univariate selection will, it is true, cause 
all the correlations to alter, but only indirectly and in such 
a way as to preserve rank, simple structure, and factor 
identity. 

But in multivariate selection it is envisaged that the 
correlation between two variables may itself be directly 
selected, and caused to have a value other than that which 
would naturally follow from the reduction of standard 
deviation in two selected variables. Selection for correla- 
tion is just as easily imagined as is selection for scatter. 
Indeed in natural selection it is possibly even commoner. 

Once we select for the correlations, however, as well as 
for scatter, new " factors " emerge, old ones change. In 
our Chapter XII we suppose a small part R pp of the whole 
correlation matrix to be changed to V pp9 and found that 
one new factor is created (page 193) or, indeed, two new 
oblique factors (page 192). We might have supposed R pp to 
be a larger portion of R : and there is nothing to prevent 
us supposing selection to go on for the whole of U, and 
writing down a brand-new table of coefficients, whose 
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" factors " would be quite different from those of the origi- 
nal table. In our example of page 245, for instance, 
where the three oblique " factors " coincided in direction 
with the communal parts of Tests 1, 4, and 6, there is 
nothing to prevent us from writing down, as having 
been produced by selection, a new set of correlation coeffici- 
ents whose analysis would identify the " factors " with the 
communal parts of Tests 2, 3, and 5. In fact, all we would 
have to do would be to renumber the rows and columns on 
page 245. Such fundamental changes could be produced 
by selection : and perhaps they have been, for natural 
selection has had plenty of time at its disposal. 

Professor Thurstone (his page 458, footnote, in Multiple 
Factor Analysis) classes the new factors produced by 
selection as " incidental factors (which) can be classed 
with the residual factors, which reflect the conditions of 
particular experiments." But we can hardly dismiss 
them thus easily if, as is conceivable, they have become 
the main or perhaps the only factors remaining, the others 
having disappeared ! 

It may be admitted at once, however, that the actual 
amount of selection from psychological experiment to 
psychological experiment is not likely to make such 
alarming changes in factors. For the use to which factors 
are likely to be put in our age, in our century or more, they 
are like to be independent enough of such selection as can 
go on in that time, and in that sense Professor Thurstone 
is justified in his thesis. Nor am I one to deny " reality " 
to any quality merely because it has been produced by 
selection, and may not abide for all time. 

11. Parallel proportional profiles. A method which, like 
Thurstone's simple structure, is meant to enable us to 
arrive at factors which are real entities, or to check 
whether our hypotheses about the factor composition of 
tests are correct, has been put forward by R. B. Cattell 
(1944&, 1946), and has interesting possibilities which its 
author will no doubt develop. The essence of his idea 
is that " if a factor is one which corresponds to a true 
functional unity, it will be increased or decreased 'as a 
whole '," and therefore if the same tests are given under 
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two different sets of circumstance, which favour a certain 
factor more in one case and less in the other, the loadings 
of the tests in that factor should all change in the same pro- 
portion. Experimental trials of this principle may be ex- 
pected soon from its author. Among " different circum- 
stances " he mentions different samples of subjects, differ- 
ing, say, in age or sex, and different methods of scoring, or 
different associated tests in the battery. But he prefers 
another kind of change of circumstance ; namely a change 
" from measures of static, inter-individual differences to 
measures from other sources of differences in the same 
variables." He instances, among his examples, inter- 
correlating changes in scores of individuals with time, or 
intercorrelating differences of scores in twins. We may 
thus have two, or several, centroid analyses, and the mathe- 
matical problem is to find rotations which will leave the 
profile of loadings of a certain factor similar in all the factor 
matrices. It may even be that the profiles of several fac- 
tors could be made similar. These factors would then 
satisfy CattelPs requirement as corresponding to " true 
functional unities." The necessary modes of calculation 
to perform these rotations have not yet been more than 
adumbrated, however. 

12. Estimation of oblique factors. In applying the 
method of section 2 of Chapter VII (pp. 107-10) to oblique 
factors, it is important to note that we must use, below the 
matrix of correlations of the tests, in a calculation like that 
on page 108, the matrix of correlations of the primary 
factors with the tests. These are the elements of the 
structure on the primary factors, F(A')~ 1 D, given at the top 
of page 278, transposed so that columns become rows and 
vice versa. It would not do to use the structure on the 
reference vectors, which is all that most experimenters 
content themselves with calculating. 

Ledermann's short cut (section 3 of Chapter VII, pp. 
110-12) requires considerable modification in the case of 
oblique factors. See Thomson (1949) and the later part of 
section 19 of the Mathematical Appendix, page 378, 



CHAPTER, XIX 

SECOND-ORDER FACTORS 

1. A second-order general factor. The reason why the 
factors arrived at in the " box " example were correlated 
was that large boxes tend to have all their dimensions 
large. There is a typical shape for a box, often departed 
from, yet seldom to an extreme degree. Therefore the 
length, breadth, and depth of a series of boxes are corre- 
lated, and so also are Thurstone's primary factors in such 
a case. There is a size factor in boxes, a general factor 
which does not appear as a first-order factor (those we 
have been dealing with) in Thurstone's analysis, but 
causes these primary factors to be correlated. Possibly, 
therefore, when oblique factors appear in the factorial 
analysis of psychological tests, there is a hidden general 
factor causing the obliquity. This factor or factors (for 
there might be more than one) can be arrived at by analys- 
ing the first-order factors, into what Thurstone calls 
second-order factors, factors of the factors. 

Of course, whether such a procedure could be justified 
by the reliability of the original experimental data is very 
doubtful in most psychological experiments. The super- 
structure of theory and calculation raised upon those data 
is already, many would urge, perhaps rather top-heavy, and 
to add a second storey unwise. But we should not, I 
think, let this practical question deter us from examining 
what is undoubtedly a very interesting and illuminating 
suggestion, which may turn out to be the means of recon- 
ciling and integrating various theories of the structure of 
the mind. 

If we take the primary factors of our " box " example of 
Chapter XVIII, they were correlated as shown in this 
matrix : 

P.A.IO* 297 
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1 


506 


150 


506 


1 


164 


150 


164 


1 



If we analyse these in their turn into a general factor 
and specifics we obtain, using the formula 

g saturation ~ 



the saturations of the primary factors with a second-order 
g as 680, -744, and -220 ; and each primary factor will 
also have a factor specific. We have now replaced the 
analysis of the original tests into three oblique factors by 
an analysis into four orthogonal factors, one of them 
general to the oblique factors and presumably also general 
to the original tests, though that we have still to inquire 
into. We must also inquire into the relationship of the 
specifics of the original tests to these second-order factors, 
which are no longer in the original three-dimensional 
common-factor space, but in a new space of four dimen- 
sions. Are the original test-specifics orthogonal to this 
new space ? 

With only three oblique factors, an analysis into one g 
is always possible (except in the Hey wood case, which will 
often occur among oblique factors). If there had been 
four or more oblique factors, we would have had to use more 
second-order general factors unless the tetrad-differences 
were zero. Thurstone's " trapezium " example already 
referred to had four oblique factors, and his article should 
be consulted by the interested. 

2. Its correlations with the tests. Let us turn now to the 
question what the correlations are between the seven 
original tests and the above second-order g. To obtain 
these Thurstone uses an argument equivalent to the fol- 
lowing : 

We may first note that each reference vector makes an 
acute angle with its own primary factor, but is at right 
angles to every other primary factor, for these are all 
contained in the hyperplane to which it is orthogonal. 
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The cosines of the angles can be obtained by premultiplying 
the rotation matrix of the reference vectors by the trans- 
pose of the rotation matrix of the primary factors. 

Correlations between Primary Factors and Reference Vectors 
DA' 1 X A =D 



797 -400 -453 
835 -187 -517 
503 -843 -192 



405 -464 -338 
426 -076 -916 
809 --883 -215 



860 



858 



988 



These cosines in the diagonal of the matrix D give us the 
angles 31, 31, and 11 which we have already mentioned 
on page 280 as the angles between each primary factor and 
its own reference vector. 

Each row of the first of the above matrices represents 
the projections of the primary factor on to the orthogonal 
centroid axes. These are, in fact, the loadings of the prim- 
ary factors, thought of as imaginary or possible tests, 
in the orthogonal centroid factors I, II, and III. Following 
Thurstone, we add these three rows below the seven rows of 
our original seven real tests, extending the matrix F in 
length thus : 



ra 





I 


// 


III 


r * 


1 


449 


-682 


165 


211 




2 


825 


478 


129 


574 




3 


906 


336 


020 


787 




4 


846 


133 


457 


666 


wanted 


5 


808 


208 


412 


719 




6 


697 


336 


335 


597 




7 


767 


173 


468 


683. 




L 


797 


400 


453 


6801 


B 


835 


187 


-517 


744 r known 


D 


503 


843 


192 


220 J 



This lengthened matrix we want to post-multiply by 
a column vector (<| in Thurstone's notation) to give the 
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correlations of the tests, including the imaginary tests 
L, B, and D 9 with the second-order g. In other words, we 
want to know by what weights each column must be mul- 
tiplied so that their weighted sum is the correlation of 
each row with g. Suppose these weights are u, v, and w. 
Since we already know from our second- order analysis 
what r g is for each of the primaries L, B, and D, we have 
three equations for u, v, and w, the solution of which gives 
us their values. We have 

797u + -400u + -4530) = -680 
835w + -1870 -5I7w -744 
503^ -843a + -192w = -220 

and these equations can be solved in the usual way, if 
the reader wishes. The values are -798, '198, and -077. 
A closer examination of them, however, which can be 
most readily expressed in matrix notation, leads to an 
easier plan especially desirable if the number of primary 
factors were greater. In matrix form the above equations 
are 

T<l* = r, 
whence ^ = T l r g 

and since T is merely a short notation for DA" 1 we have 



- AD-V, 

That is to say, the centroid loadings F of the seven tests 
have to be post-multiplied by this, giving a matrix (a 
single column) 

F<]; = FAD'V, 

But FA we already know. It is (see page 276) the simple 
structure V on the reference vectors. So we merely have 
to multiply the columns of V by D~ 1 r g and add the rows to 
get the correlation of each test with g. These multipliers 
are, that is to say : 

680 -860 = -791 
744 -858 = -867 
220 -983 = -224 

The results are the same as by the former method, except 



SECOND-ORDER FACTORS 



301 



for discrepancies due to rounding off decimals, and are 
given to the right of the preceding table. 

3. A g plus an orthogonal simple structure. In his own 
examples Thurstone has not calculated the loadings of the 
original tests' with the other orthogonal second-order 
factors, the factor specifics. This can, however, clearly be 
done by the same method as above. Since the correlations 
of the general factor with the three oblique factors are 
680, -744, and -220, the correlations of each factor specific 
with its own oblique factor are -733, -668, and -975. For 
example, 733 2 = 1 -680*. The second-order analysis 
therefore is : 



680 
744 
220 



733 



668 



975 



E 



Dividing the rows by the divisors already mentioned, viz. 
860, -858, and -983, we obtain the matrix 



791 
867 
224 



853 



779 



992 



and when the matrix V is post-multiplied by this we 
obtain the following analysis of the original seven tests 
into a general factor plus an orthogonal simple structure 
of three factors : 

General Factor plus Simple Structure 

G - VD^E 
g X P 8 



1 


211 


021 


009 


805 


2 


574 


022 


358 


683 


3 


787 


449 


333 


006 


4 


666 


656 


001 


260 


5 


719 


071 


588 


-006 


6 


597 


593 


041 


000 


7 


683 


005 


609 


000 
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The zero or very small entries in X, (3, and 8 are in the 
same places as they are for I/, B', and D' in the oblique 
simple structure V (see page 276). What we have now 
done is to analyse the box data into four orthogonal 
factors corresponding to size, and ratios of length, breadth, 
and depth. In terms of our pyramidal geometrical 
analogy we have " taken out a general factor " by depress- 
ing the ceiling of our room, squashing the pyramid down 
until its three plane sides are at right angles to each other. 

The above structure, being on orthogonal factors, is also 
a pattern, so that the inner products of its rows ought to 
give the correlation coefficients with the same accuracy, if 
we have kept enough decimal places in our calculations, as 
do the rows of the centroid analysis F : and so they do. 
For example, the correlation between Tests 1 and 2 is, 
from F, 

449 X -825 + -682 X -478 -165 X -129 -^ -675 
and from G it is 
211 X -574 + -021 X -022 + -009 X -358 +-805 X -683 --675 

The " experimental " value was -728, the difference of 
053 being due to the inaccuracy of the guessed com- 
munalities, or in an actual experimental set of data to 
sampling error and to the rank of the matrix not being 
exactly three. 

We can see here a distinct step towards a reconciliation 
between the analyses of the Spearman school and those 
of Thurstone using oblique factors. But we must not 
forget that if the oblique factors are not oblique enough, 
the Heywood embarrassment will occur, and a second- 
order g be impossible. The orthogonal factors of G are 
more convenient to work with statistically, but it is possible 
that the oblique factors of V are more realistic both in our 
artificial box example and in psychology. They corre- 
sponded in our case to the actual length, breadth, and 
depth of the boxes. The factors X, (3, and 8 of matrix G 
correspond to these dimensions after the boxes have all 
been equalized in " size." 



CHAPTER XX 

THE SAMPLING OF BONDS 

1. Brief statement of views. The purpose of this chapter 
is to give an account of the author's own views as to the 
meaning of " mental factors." This can perhaps be done 
most clearly by first expressing them somewhat emphati- 
cally and crudely, and afterwards adding the details and 
conditions which a consideration of all the facts demands. 
In brief, then, the author's attitude is that he does not 
believe in factors if any degree of real existence is attributed 
to them ; but that, of course, he recognizes that any set 
of correlated human abilities can always be described 
mathematically by a number of variables or " factors," 
and that in many ways, among which no doubt some will 
be more useful or more elegant or more sparing of unneces- 
sary hypotheses. But the mind is very much more com- 
plex, and also very much more an integrated whole, than 
any naive interpretation of any one mathematical analysis 
might lead a reader to suppose. Far from being divided 
up into " unitary factors," the mind is a rich, comparatively 
undifferentiated complex of innumerable influences on 
the physiological side an intricate network of possibilities 
of intercommunication. Factors are fluid descriptive 
mathematical coefficients, changing both with the tests 
used and with the sample of persons, unless we take 
refuge in sheer definition based upon psychological judg- 
ment, which definition would have to specify the particular 
battery of tests, and the sample of persons, as well as the 
method of analysis, in order to fix any factor. Two 
experimental observations are at the bottom of all the 
work on factors, the one that most correlations between 
human performances are positive, the other that square 
tables of correlation coefficients in the realm of mental 
measurement tend to be reducible to a low rank by suitable 
diagonal elements. The first of these (i.e. the predomi- 

303 
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nance of positive correlations) appears to be partly a 
mathematical necessity, and partly due to survival value 
and natural selection. The second (i.e. the tendency to 
low rank) is a mathematical necessity if the causal back- 
ground of the abilities which are correlated is comparatively 
without structure, so that any sample of it can occur in an 
ability. This enables one to say that the mind works as if 
it were composed of a smallish number of common faculties 
and a host of specific abilities ; but the phenomenon really 
arises from the fact that the mind is, compared with the 
body, so Protean and plastic, so lacking in separate and 
specialized organs. 

2. Negative and positive correlations.* The great major- 
ity of correlation coefficients reported in both biometric 
and psychological work are positive. This almost certainly 
represents an actual fact, namely that desirable qualities 
in mankind tend to be positively correlated ; for though 
reported correlations may be selected by the unconscious 
prejudices of experimenters, who are usually on the look- 
out for things which correlate positively, yet as those who 
have tried know, it is really very difficult to discover 
negative correlations between mental tests. Besides, even 
in imagination we cannot make a race of beings with 
predominantly negative correlations. A number of lists 
of the same persons in oi*der of merit can be all very like 
one another, can indeed all be identical, but they cannot 
all be the opposite of one another. If Lists a and b are 
the inverse of one another, List c, if it is negatively 
correlated with a, will be positively correlated with b. 
Among a number n of variates, it is logically possible to 
have a square table of correlation coefficients each equal 
to unity ; that is, an average correlation of unity. But 
the farthest the average correlation can be pushed in the 
negative direction is l/(n 1). That is, if n is large, 
the average correlation can range from + 1 to only very 
little below zero. Even Mother Nature, then, by. natural 
selection or by any other means, could not endow man 

* This section refers to correlations between tests. The greater 
frequency of negative correlations between persons has already been 
discussed in Chapter XIII, Section 8. 
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with abilities which showed both many and large negative 
correlations. If they were many, they would have to be 
very small ; if they were large, they would have to be 
very few. 

Natural selection has probably tended, on the whole, to 
favour positive correlations within the species.* In the case 
of some physical organs it is obvious that a high positive 
correlation is essential to survival value for example, 
between right and left leg, or between legs and arms. In 
these cases of actual paired organs, however, it is doubtless 
more than a mere figure of speech to speak of a common 
factor as the cause. Between organs not simply related 
to one another, as say eyes and nose, natural selection, 
if it tended towards negative correlation, would probably 
split the genus or species into two, one relying mainly on 
eyesight, the other mainly on smell. Within the one 
species, since it is mathematically easier to make positive 
than negative correlations, it seems likely that the former 
would largely predominate. To say that this was due to 

* An important kind of natural selection is the selection of one sex 
by the other in mating. Dr. Bronson Price (1936) has pointed out 
that positive cross -correlation in parents will produce positive correla- 
tion in the offspring Price further shows that this positive cross- 
correlation in the parents will result if the mating is highly homo- 
gamous for total or average goodness in the traits, a conclusion which, 
it may be remarked here, can be easily seen by using the pooling 
square described in our Chapter VI. Price concludes : " The 
intercorrelations which g has been presumed to illumine are seen 
primarily as consequences of the social and therefore marital 
importance which has attached to the abilities concerned." Price 
in his argument makes use of formulae from Sewall Wright (1921). 
M. S. Bartlett, in a note on Price's paper (Bartlett, 19376), develops 
his argument more generally, also using Wright's formulae, and says : 
" Price contrasts the idea of elementary genetic components with 
factor theories. ... It should, however, be pointed out that a 
statistical interpretation of such current theories can be and has been 
advocated. Thomson has, for example, shown . . .", and here 
follows a brief outline of the sampling theory. " On the basis of 
Thomson's theory," Bartlett adds, " I have pointed out (Bartlett, 
1937a) that general and specific abilities may naturally be defined 
hi terms of these components, and that while some statistical 
interpretation of these major factors seems almost inevitable, this 
may not in itself render their conception invalid or useless." 
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a general factor would be to hypostatize a very complex 
and abstract cause. To use a general factor in giving a 
description of these variates is legitimate enough, but is, 
of course, nothing more than another way of saying that 
the correlations are mainly positive if, as is the case, most 
people mean by a general factor one which helps in every 
case, not an interference factor which sometimes helps and 
sometimes hinders. 

3. Low reduced rank. It is, however, on the tendency 
to a low reduced rank in matrices of mental correlations 
that the theory of factors is mainly built. It has very 
much impressed people to find that mental correlations 
can be so closely imitated by a fairly small number of 
common factors. Ignoring the host of specific factors to 
which this view commits them, they have concluded that 
the agreement was so remarkable that there must be some- 
thing in it. There is ; but it is almost the opposite of 
what they think. Instead of showing that the mind has 
a definite structure, being composed of a few factors which 
work through innumerable specific machines, the low rank 
shows that the mind has hardly any structure. If the 
early belief that the reduced rank was in all cases one had 
been confirmed, that would indeed have shown that the 
mind had no structure at all but was completely undiffcr- 
entiatcd. It is the departures from rank 1 which indicate 
structure, and it is a significant fact that a general tendency 
is noticeable in experimental reports to the effect that 
batteries do not permit of being explained by as small a 
number of factors in adults as in children, probably because 
in adults education and vocation have imposed a structure 
on the mind which is absent in the young.* 

By saying that the mind has little structure, nothing 
derogatory is meant. The mind of man, and his brain, too, 
are marvellous and wonderful. All that is meant by the 
absence of structure is the absence of any fixed or strong 
linkages among the elements (if the word may for a moment 
be used without implications) of the mind, so that any 
sample whatever of those elements or components can be 
assembled in the activity called for by a " test," 
* See also Anastasi, 1936, 
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Not that there is any necessity to suppose that the mind 
is composed of separate and atomic elements. It is pos- 
sibly a continuum, its elements if any being more like the 
molecules of a dissolved crystalline substance than like 
grains of sand. The only reason for using the word 
" elements " is that it is difficult, if not impossible, to speak 
of the different parts of the mind without assuming some 
66 items " in terms of which to think. For concreteness it 
is convenient to identify the elements, on the mental side, 
with something of the nature of Thorndike's " bonds," 
and on the bodily side with neurone arcs ; in the remainder 
of this chapter the word " bonds " will be used. But 
there is no necessity beyond that of convenience and 
vividness in this. The " bonds " spoken of may be 
identified by different readers with different entities. All 
a "bond " means, is some very simple aspect of the causal 
background. Some of them may be inherited, some may 
be due to education. There is no implication that the 
combined action of a number of them is the mere sum of 
their separate actions. There is no commitment to 
" mental atomism." 

If, now, we have a causal background comprising in- 
numerable bonds, and if any measurement we make can 
be influenced by any sample of that background, one 
measurement by this sample and another by that, all 
samples being possible ; and if we choose a number of 
different measurements and find their intercorrelations, 
the matrix of these intercorrelations will tend to be 
hierarchical, or at least tend to have a low reduced rank. 
This has nothing to do with the mind : it is simply a 
mathematical necessity, whatever the material used to 
illustrate it. 

4. A mind with only six bonds. We shall illustrate this 
fact first by imagining a " mind " which can form only 
six " bonds," which mind we submit to four " tests " 
which are of different degrees of richness, the one requiring 
the joint action of five bonds, the others of four, three, and 
two respectively (Thomson, 19276). These four tests will 
(when we give them to a number of such minds) yield 
correlations with one another. For we shall suppose the 
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different minds not all to be able to form all six of the 
possible bonds, some individuals possessing all six, others 
possessing smaller numbers. 

We have only specified the richness of each test, but 
have not said which bonds form each ability. There may, 
therefore, be different degrees of overlap between them, 
though some will be more frequent than others if we form 
all the possible sets of four tests which are of richness five, 
four, three, and two. If we call the bonds a, 6, c, d, e 9 
and /, then one possible pattern of overlap would be the 
following : 
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If we for further simplicity suppose these bonds to be 
equally important, and use the formula 

Correlation = 

geometrical mean of the two totals 

we can calculate the correlations which these four tests 
would give, namely : 
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and we notice that all three tetrad-differences are zero. 
However, if we picked our four tests at random (taking 
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care only that they were of these degrees of richness) we 
would not always or often get the above pattern : in point 
of fact, we would get it only 12 times in 450. Nevertheless, 
it is one of the most probable patterns. In all, 78 different 
patterns of the bonds are possible always adhering to our 
five, four, three, and two the probability of each pattern 
ranging from 12 in 450 down to 1 in 450. One of the two 
least-probable patterns is the following : 
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This pattern gives the correlations : 

1 
2 
3 

4 

This time the tetrads are not zero, but 
2 4 



1 


2 


3 


4 


3 
A/20 
2 

2 


3 


2 


2 
V 10 


2 


A/20 
1 


VI 5 
1 

2 


A/* 2 



V6 



A/120 



6 
A/120 



It is possible in this way to calculate the tetrad-differences 
for each one of the 78 possible patterns of overlap which 
can occur. When we then multiply each pattern by the 
expected frequency of its occurrence in 450 random 
choices of the four tests, we get 450 values for each tetrad- 
difference, distributed as follows : 
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Although the distribution of each F about zero is slightly 
irregular, the average value of each F is exactly zero. For 
F! the variance is 

c2 = _2,164 = . 04() 
120 X 450 

We see, then, that in this universe of very primitive- 
minded men, whose brains can form only six bonds, four 
tests which demanded respectively five, four, three, and 
two bonds would give tetrad-differences whose expected 
value would be zero, the values actually found being 
grouped around zero with a certain variance. There is no 
particular mystery about the four " richnesses " five, four, 
three, and two, by the way. We might have taken any 
four " richnesses " and got a similar result. If we per- 
formed the still more laborious calculation of taking all 
possible kinds of four tests, we should have obtained again 
a similar result. If there are no linkages among the bonds, 
the most probable value of a tetrad-difference will always 
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be zero ; and if all possible combinations of the bonds are 
taken, the average of all the tetrad-differences will be zero. 
With only six bonds in the " mind," however, the scatter 
on both sides of zero will be considerable, as the above 
value of the standard deviation of F^ shows, viz. 

or = ^/-040 = -20 

5. A mind with twelve bonds. But as the number of 
bonds in the mind increases, the tetrad-differences crowd 
closer and closer to zero. Let us, for example, suppose 
exactly the same experiment as above conducted in a 
universe of men whose minds could form twelve bonds 
(instead of six), the four tests requiring ten, eight, six, and 
four of these (instead of five, four, three, and two) (Thom- 
son, 19276). This increase in complexity enormously 
increases the work of calculating all the possible patterns 
of overlap, arid the frequency of each. There are now 
1,257 different square tables of correlation coefficients and 
still more patterns of overlap, some of which, however, 
give the same correlations. When each possibility is taken 
in its proper relative frequency (ranging from once to 
11,520 times) there are no fewer than 1,078,110 instances 
required to represent the distribution. They have, 
nevertheless, all been calculated, and the distribution of 
F l was as follows : 



V1920 




VI 920 




VI 920 




VI 920 




*l 


Freq. 


*! 


Freq. 


*! 


Freq. 


*i 


Freq. 


20 


225 


7 


17,760 


3 


31,432 


- 13 


624 


18 


1,800 


6 


74,392 


4 


72,676 


- 14 


3,792 


16 


1,755 


5 


15,744 


. PJ 


53,808 


- 15 


4,144 


15 


4,600 


4 


52,085 


6 


49,328 


- 16 


3,970 


14 


3,840 


3 


121,608 


7 


21,240 


- 18 


112 


12 


19,610 


2 


42,384 


- 8 


41,951 


19 


456 


11 


10,632 


1 


28,096 


9 


5,896 


- 20 


584 


10 


8,360 





122,699 


10 


29,184 


-24 


28 


9 


26,696 


1 


63,024 


11 


8,960 






8 


37,735 


-2 


81,208 


12 


15,672 







Total 1,078,110 

This table again gives an average value of F^ exactly 
equal to zero. But the separate values of the tetrad- 
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difference are grouped more closely round zero than 
before, with a variance now given by 



1,920 X 1,078,110 

This is rather less than half the previous variance. 
Doubling the number of bonds in the imagined mind has 
halved the variance of the tetrad-differences. If we were 
to increase the number of potential bonds supposed to 
exist in the mind to anything like what must be its true 
figure, we would clearly reach a point where the tetrad- 
differences would be grouped round zero very closely 
indeed. 

The principle illustrated by the above concrete example 
can be examined by general algebraic means, and the above 
suggested conclusion fully confirmed (Mackic, 1928a, 
1929). It is found that the variance of the tetrad-differ- 
ences sinks in proportion to lj(N 1), where N is the 
number of bonds, when N becomes large, and the above 
example agrees with this even for such small N's as 6 and 
12: for 

6 1 



12 



X -040 = -018 as found. 



In this mathematical treatment, bonds have been spoken 
of as though they were separate atoms of the mind, and, 
moreover, were all equally important. It is probably 
quite unnecessary to make the former assumption, which 
may or may not agree with the actual facts of the mind, 
or of the brain. Suitable mathematical treatment could 
probably be devised to examine the case where the causal 
background is, as it were, a continuum, different proportions 
of it forming tests of different degrees of richness. And as 
for the second assumption, it is in all likelihood merely 
formal. Let the continuum be divided into parts of equal 
importance, and then the number of these increased and 
their extent reduced, keeping their importance equal. 
What is necessary, to give the result that zero tetrads are 
so highly probable, is that it be possible to take our tests 
with equal ease from any part of the causal background ; that 
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there be no linkages among the bonds which will disturb the 
random frequency of the various possible combinations ; 
in other words, that there be no " faculties " in the mind. 
And it is also necessary that all possible tests be taken in 
their probable frequency. 

In any actual experiment, of course, it is quite imprac- 
ticable to take all possible tests, which are indeed infinite 
in number. A sample of tests is taken. If this sample 
is large and random, then there should, in a mind without 
separate u faculties," without linkages between its bonds, 
be an approach to zero tetrads. The fact that this ten- 
dency attracted Professor Spearman's attention, and was 
sufficiently strong to make him at first believe that all 
samples of tests showed it, provided care was taken to 
avoid tests so alike as to be almost duplicates (which 
would be " statistical impossibilities " in a random sample), 
indicates that the mind is indeed very free to use its bonds 
in any combination, that they are comparatively unlinked. 

6. Professor Spearmari s objections to the sampling 
theory. A theory very similar to that of the sampling 
theory (but, as will be explained, with an entirely different 
meaning of sampling) had previously been considered by 
Professor Spearman (Spearman, 1914, 109 footnote), but 
had been dismissed by him because it would give a correla- 
tion between any two columns of the correlation matrix 
equal to the correlation between the two variates from 
which the columns derived, both of which correlations (he 
added) would on this theory average little more than zero 
(see also Spearman, 1928, Appendices I and II). A further 
objection raised by him (Abilities, 96) is that the " doctrine 
of chance," as he calls the sampling theory, would cause 
every individual to tend to equality with every other 
individual, than which, as he said, anything more opposed 
to the known facts could hardly be imagined. 

These conclusions, however, have been deduced from a 
form of sampling, if it can be called sampling, which differs 
from that proposed by the present writer in the sampling 
theory. In the " doctrine of chance " discussed by Spear- 
man, .each ability is expressed by an equation containing 
every one of the elementary components or bonds, each 
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with a coefficient or loading (see Thomson, 19356, 76; 
and Mackie, 1929, 30). The different abilities differ only 
in the loadings of the " bonds," and although some of 
these may be zero, the number of such zero loadings is 
insignificant. 

But the sampling theory assumes that each ability is 
composed of some but not all of the bonds, and that abilities 
can differ very markedly in their " richness," some needing 
very many " bonds," some only few. It further requires 
some approach to " all-or-none " reaction in the " bonds" ; 
that is, it supposes that a bond tends either not to come 
into the pattern at all, or to do so with its full force. This 
does not seem a very unnatural assumption to make. It 
would be fulfilled if a " bond " had a threshold below which 
it did not act, but above which it did act ; and this property 
is said to characterize neurone arcs and patterns. When 
this form of sampling is assumed and it is submitted that 
this is the normal meaning of sampling then neither do 
the correlations become zero with an infinity of bonds, nor 
men equal ; but the rank of the correlation matrix tends 
to be reducible to a small number, if all possible correlations 
are taken, and finally to be one as the bonds increase without 
limit. 

It is important to realize what is meant by the rank 
tending to rank 1 as more and more of the possible corre- 
lations are taken. When the rank is 1 the tetrad- 
differences are zero. But clearly, the reader may say, 
taking more and more samples of the bonds to form more 
and more tests will not change in any way the pre-existing 
tetrad-differences, will not make them zero if they are not 
zero to start with. That is perfectly true ; but that is not 
what is meant. As more and more tests are formed by 
samples of the bonds, the number of zero and very small 
tetrads will increase and swamp the large tetrads. The 
sampling theory does not say that all tetrads will be 
exactly zero, or the rank exactly 1. It says that the 
tetrads will be distributed about zero (not because each 
is taken both plus and minus, but when all are given their 
sign by the same rule) with a scatter which can be reduced 
without limit, in the sense that with more bonds the pro- 
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portion of large tetrads becomes smaller and smaller ; 
always provided all possible samples are taken, i.e. that 
the family of correlation coefficients is complete. 

With a finite number of tests this, of course, is not the 
case ; but if the tests are a random sample of all possible 
tests, there will again be the approach to zero tetrads. 
The same will be true if the tests are sampling not the whole 
mind, but some portion of it, some sub-pool of our mind's 
abilities. If we stray from this pool and fish in other 
waters, we shall break the hierarchy ; but if we sampled 
the whole pool of a mind, we should again find the tendency 
to hierarchical order. If the mind is organized into sub- 
pools (such as the verbal sub-pool, say), then we shall be 
liable to fish in two or three of them, and get a rank of 
2 or 3 in our matrix, i.e. get two or three common factors, 
in the language of the other theory. 

7. Contrast with physical measurements. The tendency 
for tetrad-differences to be closely grouped around zero 
appears to be stronger in mental measurements than else- 
where ; stronger, for example, than in physical measure- 
ments (Abilities, 142-3). In the comparisons which have 
been made, there has been some injustice done to the 
physical distributions ; for diagrams have been published 
showing all the larger tetrads lumped together on to a 
small base so as to make the distribution look actually 
U-shaped. If, however, equal units are used throughout, 
the tetrad-differences are seen to be distributed here also 
in a bell-curve centred on zero (Thomson, 1927a),* though 
with a variance a good deal larger than is found in mental 
measurements (especially, of course, when the latter have 
been purified of all tests which give large tetrad-differ- 

* In the paper quoted (Thomson, 1927a), the author mistakenly 
took each tetrad-difference with the sign obtained by beginning in 
every case with the north-west element. It is, however, Professor 
Spearman's practice to take every tetrad-difference twice, once 
positive and once negative. If this be done, a histogram like that 
on page 249 of the paper quoted becomes, of course, perfectly 
symmetrical. This change could be made throughout the paper 
without in any way affecting its main argument. The figure on 
page 249 (Thomson, 1927a) should be compared with that on 
page 143 of The Abilities of Man, 
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ences !). In spite of the difficulty of arriving, therefore, at 
a fair judgment with such evidence, it seems nevertheless 
likely that physical measurements do indeed show a 
weaker tendency to zero tetrads. For the tendency to 
zero tetrads, outlined above, due to the measurements 
sampling a complex of many bonds, will show itself only 
when the measurements in a battery are a fairly random 
sample of all the measurements which might be made. 

Now, in physical measurements this is not the case. We 
do not measure a person's body just from anywhere to 
anywhere. We observe organs and measure them leg, 
cranium, chest girth, etc. The variatcs are not a random 
sample of all conceivable variates. In other words, the 
physical body has an obvious structure which guides our 
measurements. The background of innumerable causes 
which produce just this particular body which is before us 
cannot act in all directions, but only in linked patterns. 
The tendency to zero tetrad-differences in the mind is due 
to the fact that the mind has, comparatively speaking, no 
organs. We can, and do, measure it almost from anywhere 
to anywhere. No test measures a leg or an arm of the 
mind ; every test calls upon a group of the mind's bonds 
which intermingles in most complicated ways with the 
groups needed for other tests, without being a set pattern 
immutably linked into an organ. Of all the conceivable 
combinations of the bonds of the mind we can, without 
great difficulty, take a random sample, whereas in physical 
measurements we take only the sample forced on us by the 
organs of the body. Being free to measure the mind almost 
from anywhere to anywhere, we can get a set of measure- 
ments which show " hierarchical order " without overgreat 
trouble. We can do so because the mind is so compara- 
tively structureless. Mental measurements tend to show 
hierarchical order, and to be susceptible of mathematical 
description in terms of one general factor and innumerable 
specifics, not because there are specific neural machines 
through which its energy must show itself, but just exactly 
because there are no fixed neural machines. The mind is 
capable of expressing itself in the most plastic and Protean 
way, especially before education, language, the subjects of 
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the school curriculum, the occupation, and the political 
beliefs of adult life have imposed a habitual structure on 
it. It is not without significance that the " factor " most 
widely recognized after Spearman's g is the verbal factor v, 
the mother-tongue being, as it were, the physical body of 
the mind, its acquired structure. 

8. Interpretation of g and the specifics on the sampling 
theory. We saw in Chapter III that the fraction express- 
ing the square of the saturation of a test with g expresses 
in the sampling theory the fraction of the whole mind, 
or of the sub-pool of the mind, which that test forms. If 
the hierarchical battery is composed of extremely varied 
tests, which cover very different aspects of the mind's 
activity, this fraction may be taken as being of the whole 
mind of the whole mind, that is, of an ideal man who can 
perform all of these tests perfectly, and all others which 
can extend their hierarchy. When we estimate a person's 
g, from such a battery, we are deducing a number which 
expresses how far that person is above or below average 
in the number of these bonds which his mind can form. 
This interpretation of g agrees well with an opinion arrived 
at, from quite another line of approach, by E. L. Thorndike, 
who on and near page 415 of his Measurement of Intelligence 
enunciates what has been called by others the Quantity 
Hypothesis of intelligence that one mind is more intelli- 
gent than another simply because it possesses more inter- 
connections out of which it can make patterns. 

The difference in point of view between the sampling 
theory and the two-factor theory is that the latter looks 
upon g as being part of the test, while the former looks 
upon the test as being part of g. The two-factor theory 
is therefore compelled to postulate specific factors to 
account for the remainder of the variance of the test, and 
has to go on to offer some suggestion as to what specific 
factors are perhaps neural engines. The sampling theory 
simply says that the test requires only such and such a 
fraction of the bonds of the whole mind the same fraction 
which, on the two-factor theory, g forms of the variance 
of the test. For it, specific factors are mere figments, 
which do not arise unless, as can be done, the mathematical 
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equations which represent the tests are so manipulated 
that there appears to be only one link connecting them all. 
The sampling theory does not make this transformation 
of the equations (see Appendix, paragraph 6). Those who 
do so, if they adhere to the interpretation that g means all 
the bonds of the whole mind, have to suppose that the 
whole mind first takes part in each activity, but that in 
addition a specific factor is concerned ; which specific factor, 
since they have already invoked the whole mind, must be for 
them a second action of part of the mind annulling its former 
assistance which is absurd. The two-factor equations 
then do not allow us to consider g as being all the bonds 
of the mind. They are mathematically equivalent to the 
sampling equations, but not psychologically or neurologi- 
cally. To the holder of the sampling theory, the factors 
of the other view are statistical entities only, g an average 
(or a total) of all a man's bonds, a specific factor the 
contrast between performance in any particular test and 
a person's general ability (Bartlett, 1937, 101-2). As a 
manner of speaking, the two-factor theory appears to the 
author to be much more likely to " catch on " with the 
man in the street, but much more likely to lead to the 
hypostatization of mere mathematical coefficients. The 
sampling theory lacks the good selling-points of the other, 
but is comparatively free from its dangers, and seems much 
more likely to come into line, in due time, with physio- 
logical knowledge of the action of the nervous system. 

9. Absolute variance of different tests. It will be noted, 
too, that on the sampling theory the different tests will 
naturally have different variances, the " richer " tests 
having a wider scatter. This seems only natural. It is 
customary, at any rate in theoretical discussions, to reduce 
all scores in different tests to standard measure, thereby 
equalizing their variance. This seems inevitable, for there 
is no means of comparing the scatter of marks in two 
different tests. But it does not follow that the scatter 
would be really the same if some means of comparison 
were available. When the same test is given to two 
different groups we have no hesitation in ascribing a wider 
variance to the one or the other group, and it seems con- 
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ceivable that a similar distinction might mentally be made 
between the scores made by one group in two different 
tests. The writer is completely in accord with M. S. Bart- 
lett when he says (Bartlett, 1935, 205) : " I think many 
people would agree . . . that the variation in mathematical 
ability displayed even in a selected group such as Cam- 
bridge Tripos candidates cannot be altogether put down 
to the method of marking adopted by the examiners." 
We may put these mathematics marks into standard 
measure, and we may put the marks scored by the same 
group in, say, a form-board test, also into standard measure. 
But that does not imply that at bottom the two variances 
are equal, if only we had some rigorous way of comparing 
them. Our common sense tells us plainly that they are 
not equal in the absolute sense, though for many purposes 
their difference is irrelevant. It seems to be no defect, 
then, but rather a good quality, of the sampling theory 
to involve different absolute variances. 

10. A distinction between g and other common factors. 
The writer is inclined, as the earlier sections of this chapter 
imply, to make a distinction in interpretation between the 
Spearman general factor g and the various other common 
factors, mostly if not all of less extent than g, which have 
been suggested. When properly measured by a wide and 
varied hierarchical battery, g appears to him to be an 
index of the span of the whole mind, other common factors 
to measure only sub-pools, linkages among bonds. The 
former measures the whole number of bonds ; the latter 
indicate the degree of structure among them. 

Some of this " structure " is no doubt innate ; but more* 
of it is probably due to environment and education and 
life. Its expression in terms of separate uncorrelated 
factors suggests what is almost certainly not the case, that 
the " sub-pools " are separate from one another. The 
actual organization is likely to be much more complicated 
than that, and its categories to be interlaced and inter- 
woven, like the relationships of men in a community, 
plumbers and Methodists, blonds, bachelors, smokers, 
conservatives, illiterates, native-born, criminals, and 
school-teachers, an organization into classes which cut 
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across one another right and left. No doubt these too 
could be replaced, and for some purposes replaced with 
advantage, by a smaller number of uncorrelated common 
factors and a large number of factors specific to plumbers, 
smokers, and the rest. But the factors would be pure 
figments. What the factorist calls the verbal factor, for 
example, is something very different from what the world 
recognizes as verbal ability. The latter is a compound, 
at least of g and v 9 and possibly of other factors. The v 
of the factorist is something uncorrelated withg, something 
which the person of low g is just as likely to have as the 
person with high g. But this is only so long as factors 
are kept orthogonal. The acceptance of oblique factors, 
as in Thurstone's system, would change this, and be in 
accordance with popular usage. 

Further, it is improbable that the organization of each mind 
is the same. The phrase " factors of the mind " suggests 
too strongly that this is so, and that minds differ only in 
the amount of each factor they possess. It is more than 
likely that different minds perform any task or test by 
different means, and indeed that the same mind does so at 
different times. 

Yet with all the dangers and imperfections which attend 
it, it is probable that the factor theory will go on, and will 
serve to advance the science of psychology. For one thing, 
it is far too interesting to cease to have students and 
adherents. There is a strong natural desire in mankind 
to imagine or create, and to name, forces and powers 
behind the fa$ade of what is observed, nor can any excep- 
tion be taken to this if the hypotheses which emerge 
explain the phenomena as far as they go, and are a guide 
to further inquiry. That the factor theory has been a 
guide and a spur to many investigators cannot be denied, 
and it is probably here that it finds its chief justification. 



CHAPTER XXI 

THE MAXIMUM LIKELIHOOD METHOD OF 
ESTIMATING FACTOR LOADINGS 

(by D. N. Lawley) 

1. Basis of statistical estimation. In recent times attempts 
have been made to introduce into factorial analysis statis- 
tical methods developed in other fields of research. In 
particular the method of statistical estimation put forward 
by Fisher (1938, Chapter IX), and termed the method of 
maximum likelihood, has been applied by Lawley (1940, 
1941, 1943) to the problem of estimating factor loadings. 
This method has the property of using the largest amount 
of available information contained in the data and gives 
" efficient " estimates, where such exist, of all unknown 
parameters, i.e. estimates which, roughly speaking, are on 
the average nearer the true values than those obtained by 
other, " inefficient," methods of estimation. 

Before using the maximum likelihood method for esti- 
mating factor loadings it is necessary to make certain 
initial assumptions. We assume that both the test scores 
and the factors, of which they are linear functions, are 
normally distributed throughout the population of indi- 
viduals to be tested. This assumption of normality has 
been the subject of some criticism, but in practice it would 
appear that departure from strict normality of distribution 
is not very serious. It is also necessary to make some 
hypothesis concerning the number of general factors 
which are present in addition to specifics. We shall later 
on show how this hypothesis may be tested, and how it 
may be determined whether the number assumed is, in fact, 
sufficient to account for the data. 

2. A numerical example. In order to illustrate the calcu- 
lations needed we shall reproduce an example used by 
Lawley (1943), where eight tests were given to 443 indi- 
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viduals. The table below gives the correlations between 
the eight tests, unities having for convenience been placed 
in the diagonal cells. In this example the hypothesis 
made is that two general factors, together with specifics, 
are sufficient to account for the observed correlations. 
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2 


3 


4 


5 


6 


7 


8 


1 


1-000 


312 


405 


457 


500 


350 


521 


564 


2 


312 


1-000 


460 


316 


279 


173 


339 


288 


3 


405 


460 


1-000 


394 


380 


258 


433 


323 


4 


457 


316 


394 


1-000 


460 


222 


516 


486 


5 


500 


279 


380 


460 


1-000 


239 


-441 


417 


6 


350 


173 


258 


222 


239 


1-000 


302 


262 


7 


521 


339 


433 


516 


441 


302 


1-000 


547 


8 


564 


288 


323 


486 


417 


262 


547 


1-000 



The method of estimation about to be described is one 
of successive approximations. Each successive step in the 
calculations gives a set of factor loadings which are nearer 
to the final values than those of the previous set. To 
start the process it is only necessary to guess or to find by 
some means (e.g. by a centroid analysis) first approxima- 
tions to the factor loadings. Any set of figures within 
reason will serve the purpose, though, of course, the better 
the approximation the. fewer steps in the calculation will 
be needed. For illustration we shall take as first approxi- 
mations to the factor loadings the set of values given below: 

Tests 



JLllttl 

loading in 


1 


2 


3 


4 


5 


6 


7 


8 


Factor I 


73 


50 


66 


66 


62 


40 


73 


70 


Factor II 


17 - 


-27 


47 


08 


06 


02 


10 


29 


Specific 



















variance -4382 -6771 -3435 -5580 -6120 -8396 -4571 -4259 

Under the loadings are written the corresponding first 
approximations to the specific variances (the total variance 
of each test being taken to be unity). They are as usual 
found by subtracting from unity the sums of squares of 
the loadings for each test. 

The calculations necessary for obtaining second approxi- 
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mations to the loadings in factor I may now be set out as 
follows : 

(a) 1-666 -738 1-921 1-183 1-013 -476 1-597 1-644 

(b) 5-647 3-895 5-132 5-129 4-830 3-100 5-647 5-412 

(c) 4-917 3-395 4-472 4-469 4-210 2-700 4-917 4-712 

hi = 45-724 1/&! = 0-14789 

(d) -727 -502 -661 -661 -623 -399 -727 -697 

The first row of figures, row (a), is found by dividing the 
trial loadings in factor I by the corresponding specific 
variances. The figures in row (b) are then given by the 
inner products (see footnote, page 31) of row (a) with the 
successive rows (or columns) of the correlation table 
printed above, and row (c) is obtained by subtracting 
from the figures in row (b) the corresponding loadings in 
factor I. The quantity hi is given by the inner product 
of rows (a) and (c), and hence, taking the square root of the 
reciprocal of this quantity, we find I /hi. Finally, row (d) 
is obtained by multiplying the figures in row (c) by l/h i9 
or -14789. The resulting numbers are then second 
approximations to the loadings of the tests in factor I. 

The most direct way of obtaining second approximations 
to the loadings in factor II is to find the residual matrix 
which results from removing the effect of factor I, and to 
treat it in the same way as the original matrix, using this 
time the trial loadings in factor II. A less direct but con- 
siderably shorter method may, however, be obtained by using 
once more the original matrix and modifying the process 
slightly. The necessary calculations are as shown below : 

(e) -388 -399 1-368 -143 -098 -024 -219 -681 

(/) -330 --560 --980 -150 -113 -038 -190 -580 

Pl = -0234 

(g) '177 278 495 -085 -068 -027 -107 -306 

k\ = 1-1080 1/&! = -9500 

(h) -168 --264 --470 -081 -065 -026 -102 -291 

Row (e) is found by dividing the trial loadings in factor II 
by the corresponding specific variances, while the numbers 
in row (/) are given by the inner products of row (e) with 
the rows of the correlation table. 

The step by which row (g) is obtained from row (/) is 
a little more complicated than the corresponding step in 
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the calculations for the first- factor loadings. From each 
number in row (/) we subtract not only the corresponding 
trial loading in factor II, but also a correction which 
eliminates the effect of factor I ; this correction consists 
of the corresponding number in row (d) multiplied by 
0234, the inner product of rows (e) and (d). Thus, for 
example, the number -177 in row (g) is equal to 

-330 -170 -727 X ( -0234). 

In general, where more than two factors are assumed to be 
present and where further approximations are being calcu- 
lated for the loadings in the rth factor, there will be (r 1) 
such corrections to be subtracted, one for each of the 
preceding factors. 

Having found row (g) the quantity k\ is now given by the 
inner product of rows (e) and (g), from which, taking the 
square root of the reciprocal, we derive l/A^. Row (h) 
is then obtained by multiplying the figures in row (g) by 
1/fex, or -9500. We have thus found second approximations 
to the loadings in factor II. 

The whole cycle of calculations may now be repeated 
over and over again until the required degree of accuracy 
is reached. In practice, provided that the initial trial 
loadings are not too far out, one repetition of the process 
will usually be found sufficient. In our example the final 
estimates (with possible slight errors in the last decimal 
place) were as follows : 

Tests 

Loading in 12 3 4 5678 

Factor I -725 -503 -664 -661 -623 -399 -726 -694 

Factor II -172 261 --468 -087 -069 -027 -106 -291 
Specific 

variance -445 -679 -340 -556 -607 -840 -462 -434 

Having obtained these figures, there is, of course, no 
objection to rotating the factors as desired in order to 
reach a psychologically acceptable position. 

3. Testing significance. A difficulty in most systems of 
factorial analysis is to know how many factors it is worth- 
while to " take out," and to decide how many of them may 
be considered significant. From a statistical point of 
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view objections can be raised against the majority of 
methods at present in use for this purpose. When, how- 
ever, the number of individuals tested is fairly large, the 
maximum likelihood method provides a satisfactory means 
of testing whether the factors fitted can be considered 
sufficient to account for the data. 

To illustrate this let us return to the example of the 
previous section. It is first of all necessary to calculate 
the matrix of residuals obtained when the effect of both 
factors is removed from the original correlation matrix. 
For this purpose we use the final estimates of the loadings 
as already given. The residual matrix, with the specific 
variances inserted in the diagonal cells, is as follows : 

12345678 



(.445) 008 -004 -037 -036 -056 024 -Oil 

008 (-679) -004 -006 --016 021 -001 -015 

004 -004 (-340) -004 --001 -006 -001 --002 

037 -000 -004 (-556) -042 -044 -027 -002 

036 -016 -001 -042 (-607) Oil -019 035 

056 -021 -006 -044 -Oil (-840) -009 023 



024 -001 -001 -027 019 -009 (-462) -012 
Oil -015 -002 -002 -035 -023 -012 (-434) 

We are now able to calculate a criterion, which we shall 
denote by to, for deciding whether the hypothesis that only 
two general factors are present should be accepted or 
rejected. Each of the above residuals is squared and 
divided by the product of the numbers in the corresponding 
diagonal cells. Thus, for example, the residual for 
Tests 4 and 7 is squared and divided by the product of 
the fourth and seventh diagonal elements, giving the result 

== - 002838 - 



There are altogether 28 such terms, one for each residual, 
and w is obtained by forming the sum of these terms and 
multiplying it by 443, the number in the sample. The 
result is found to be 20-1. 

When the number in the sample is fairly large w is 
distributed approximately as ^ 2 with degrees of freedom 
given by 

J|(n m) 2 n w}, 
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where n is the number of tests and m is the assumed num- 
ber of factors. To test whether the above value of w is 
significant we now use a ^ 2 table such as is given by 
Fisher and Yates (1943, page 31). In our case, putting 
n = 8 and m = 2, the number of degrees of freedom is 13. 
Entering the ^ 2 table with 13 degrees of freedom, we find 
that the 1 per cent, significance level is 27-7. This means 
that if our hypothesis that only two general factors are 
present is correct, then the chance of getting a value of w 
greater than 27-7 is only 1 in 100. If, therefore, we had 
obtained a value of w greater than 27-7 wo should have 
been justified in rejecting the above hypothesis and in 
assuming the existence of more than two general factors. 
In our case, however, the value of w is only 20-1, well below 
the 1 per cent, significance level. We have thus no 
grounds for rejection, and although we cannot state that 
only two general factors are present, we have no reason to 
assume the existence of more than two. 

It must be emphasized that the method described above 
is not applicable if other, inefficient, estimates of the 
loadings are substituted for the maximum likelihood 
estimates. For the value of ^ 2 would in that case be 
greatly exaggerated, causing us to over-estimate its 
significance. For this reason we cannot, for example, 
use the method for testing the significance of the re- 
siduals left when factors have been fitted by the centroid 
method. 

4. The standard errors of individual residuals. A method 
has now* been developed for finding the standard errors 
of individual residuals. This should be useful when a few 
of the residuals are very large, while the rest are small. 
In such a case one or more of the residuals may be highly 
significant, when tested individually, even though the 
value of x a does not attain significance. The method 
ignores errors of estimation of the specific variances, which 
are not, however, likely to be very large provided that the 
number of tests in the battery is not too small. 

Let us denote by I i9 m i the estimated loadings of the i th 
test in the first and second factors respectively (assuming 
* Lawley in the Proc. Roy. Soc. Edin., 1949. 
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the existence of only two factors). Let v i be the specific 
variance of the i fh test, and let us write 



* 



Then the standard error of the residual for the i th and j i} 
tests (i = j) is given by 



v 

where 



, 
and 



This formula may, of course, be easily extended to take 
into account any number of factors. 

Let us illustrate the use of the above formula with the 
same numerical example as before. If we wish to test the 
significance of the residual for the first and fourth tests 
after removing two factors, we have 



/! = -725 
Z 4 - -661 
h = 6-7185 
Hence e = -33845 



n 



m l = -172 v l 

m 4 -087 4 
k = 1-0528 

e u = -48329 e u 



= -44479 
= -55551 

=- -08554 



and 



= . 196 



Thus the residual in question has a value of 037 with a 
standard error of -020. It is clearly not significant. 

5. The standard errors of factor loadings. When maxi- 
mum likelihood estimation has been used, we are able to 
find the standard errors of not only the residuals but also 
the estimated factor loadings. Using the same notation as 
in the preceding section, the sampling variance of I i9 the 
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loading of the i (h test in the first factor is (assuming the test 
to be standardised) 



and the standard error is the square root of this. 

The covariance between any two first factor loadings l { 
and / ; is given by 



The formulae for the variances and covariances of the 
subsequent factor loadings are more complex. Thus the 
variance of m iy the loading of the i Ut test in the second 
factor, is 



while the covariance between m l and MJ is 



The results for the general case, where more than two 
factors have been assumed present, may be written down 
without difficulty. Each factor will give rise to one more 
term within the curly brackets than the preceding factor. 
It should be noted that the last of such terms, and that 
alone, is multiplied by \. 

One interesting property of maximum likelihood esti- 
mates is that the loadings in any one factor are uncorre- 
lated with those in any other factor. Thus any one of 
I l9 Z 2 , / 3 , ..... is uncorrelated with any one of m l9 m 2 , 



It must be stressed that all the above results are applic- 
able only to the unrotated loadings. 
In our numerical example, we find 

1+7 = 1-14884 
h 

1 +7 = 1-9498 
k 
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Hence the variance of 1^, for example, is 

1>14884 X 1-14884 X -725 a = '001810 



]l | X 1-14884 X -725 a [ 
[ J 



443 [ 
while that of m^ is 

1-9498 f 

1 1-14884 X -725 2 J X 1-9498 X -172 2 X -001617 



Thus the loading of test 1 in the first factor is -725 with a 
standard error of 

V -001 8 10 = -043 

and its loading in the second factor is -172 with a standard 
error of 

V-001617 = -040 

6. Advantages and disadvantages. To sum up : the 
chief advantage of the maximum likelihood method of 
estimating factor loadings is that it does lead to efficient 
estimates and does provide a means of deciding how many 
factors may be considered necessary. It unfortunately 
takes, however, much longer to perform than a ccntroid 
analysis, particularly when the battery of tests is a large 
one and when several factors are to be fitted. The chief 
labour of the process lies in the calculation of the various 
inner products ; although in this respect it does not differ 
greatly from Hotelling's method of finding " principal 
components." The maximum likelihood method is thus 
likoly to be most useful in cases where accurate estimation 
is desirable and where it is proposed to make a test of 
significance. 

The method also possesses the advantage of being 
independent of the units in which the test scores are 
measured. The same system of factors is therefore 
obtained whether the correlation or the covariance matrix 
is analysed. The loadings in the one case are directly 
proportional to those in the other. 

Postscript. For a more detailed exposition of Lawley's 
method, with checks, see Emmett (1949). 
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CHAPTER xxit 
SOME FUNDAMENTAL QUESTIONS 

IT seems advisable to conclude with a brief discussion of 
some of the fundamental theoretical questions needing an 
answer. Among these are the following, of which (1) 
and (3) are rather liable to be forgotten by those actually 
engaged in making factorial analyses : 

(1) What metric or system of units is to be used in 
factorial analysis ? 

(2) On what principle are we to decide where to stop the 
rotation of our factor-axes or how to choose them so that 
rotation is unnecessary ? 

(3) Is the principle of minimizing the number of 
common factors, i.e. of analysing only the communal 
variance, to be retained ? 

(4) Are oblique, i.e. correlated, factors to be permitted ? 
1. Metric. Most of the work done in factorial analysis 

has assumed the scores of the tests to be standardized ; 
that is to say, in each test the unit of measure has been the 
actual standard deviation found in the distribution. 
This is in a sense a confession of ignorance. The accidental 
standard deviation which happens to result from the par- 
ticular form of scoring used in a test means, of course, 
nothing more. Yet there is undoubtedly something to be 
said for the probability of real differences of standard 
deviation existing between tests (see Chapter XX, 
Section 9). In that case, if we knew these real standard 
deviations, we would use variances and covariances and 
analyse them, not correlations (compare Hotelling, 1933, 
421-2 and 509-10). 

Burt has urged the use of variances and covariances, 
which are indeed necessary to him to enable his relation 
between trait factors and person factors to hold (see Chap- 
ter XIV, page 214). But the variances and covariances 
he actually uses are simply the arbitrary ones which arise 

330 
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from the raw scores, and depend entirely upon the scoring 
system used in each test. It would seem necessary to 
have some system of rational, not arbitrary, units. 

Hotelling has already suggested one such, based upon 
the idea of the principal components of all possible tests, 
but it would seem to be unattainable in practice (Hotel- 
ling, 1933, 510). Another can be based on the ideas of the 
sampling theory and has already been foreshadowed in 
Chapter XX, Section 9. Tests quite naturally have 
different variances on that theory, since they comprise 
larger or smaller samples of the " bonds " of the mind 
(see Thomson, 1935&, 87). In a hierarchical battery these 
natural variances are measured by the " coefficient of 
richness " (Chapter III, Section 2, page 45). The 
" richness " of Test k is given by 

r ik r jk 

the same quantity as the square of Spearman's " satura- 
tion with g." It is, on the sampling theory, the fraction 
which the test forms of the pool of bonds which is being 
sampled, and is the natural variance of the test in compari- 
son with other tests from that pool. The " saturation 
with g " of Spearman's theory is the " natural standard 
deviation " of the sampling theory. Even in a battery 
which is not hierarchical, the formula (Chapter IX, 
Section 5, page 154) 

2 J^t 



T -2A 

will give a rough estimate of the natural standard deviation 
of each test. The general principle is that tests which 
show the most total correlation have the largest natural 
variance. 

2. Rotation. Our views on the rotation of factors will 
depend on what we want them to do. Burt looks upon 
them as merely a convenient form of classification and is 
content to take the principal axes of the ellipsoids of density, 
or that approximation to them given by a good centroid 
analysis, as they stand, without any rotation. He " takes 
out " the first centroid factor, either by calculation or 
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by selecting a very special group of persons each of whom 
has in a battery of tests an average score equal to the 
population average, each of the tests also having the same 
average as every other test in tKe battery over this sub- 
group of persons (Burt, 1938). He concentrates attention 
on the remaining factors, which are " bipolar," having 
both positive and negative weights in the tests. When, 
as in the article referred to, he is analysing temperaments, 
this fits in well with common names for emotional charac- 
teristics, for those names too are usually bipolar, as 
brave-cowardly, extravagant-stingy, extravert-introvert, 
and so on. 

Thurstone, on the other hand, emphatically insists on 
the need for rotation if the factors are to have psycho- 
logical meaning (Thurstone, 1938a, 90). The centroid 
factors are mere averages of the tests which happen to 
form the battery, and change as tests are added or taken 
away, whereas he wants factors which are invariant from 
battery to battery. I think he would put invariance 
before psychological meaning, and say that if a certain 
factor keeps turning up in battery after battery we must 
ask ourselves what its psychological meaning is. His 
own opinion, backed up by a great deal of experimental 
work of a pioneering and exploratory nature, is that his 
principle of rotating to " simple structure " gives us also 
psychologically meaningful and invariant factors. 

The problems of rotation and metric are not unconnected, 
and one piece of evidence in favour of rotating to simple 
structure is that the latter is independent of the units 
used in the tests. If instead of analysing correlations we 
analyse co variances, with whatever standard deviations 
we care to assign to the tests, we get a centroid analysis 
quite different from the centroid analysis of correlations. 
But if we rotate each to simple structure the tables are 
identical, except, of course, that in the covariance structure 
each row is multiplied by the standard deviation of the 
test. 

For example, if we take the six tests of Chapter XVI, 
Section 4 (page 246) and ascribe arbitrary standard 
deviations of 1, 2, 3, 4, 5, and 6 to them, we can replace the 
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correlations and communalities by covariances and vari- 
ance-communalities, and perform a centroid analysis. 
Since we know the proper communalities* it comes out 
exactly in three factors with no residues, and gives the 
centroid structure : 





I 


II 


III 


1 


372 


567 


462 


2 


948 


1-278 


-060 


3 


1-969 


1-016 


-337 


4 


1-002 


1-072 


2-118 


5 


2-992 


593 


1-716 


6 


3-379 


2-493 


337 



When this is rotated to simple structure, by post- 
multiplication by the matrix 



802 -389 -453 

592 -416 -691 

080 822 -564 



the resulting table is : 



1 
2 
3 

4 
5 
6 



A 



2-154 

2-187 
4-213 



B 

950 
619 

2-577 



820 
1-278 



2-732 



This is identical with the simple structure found from 
the correlations, if the rows here are divided by 1, 2, 3, 4, 
5, and 6, the standard deviations. It is definitely a point 
in favour of simple structure that it is thus independent 

* If we have to guess communalities, our two simple structures 
will differ slightly because the highest covariance in a column may 
not correspond to the highest correlation. But with a battery of 
many tests this difference will be unimportant, and could be 
annulled by iteration. 
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of the system of units employed. Spearman's analysis of 
a hierarchical matrix into one g and specifics also has this 
property of independence of the metric. If the tetrad- 
differences of a matrix of correlations are zero, and we 
analyse into one general factor and specifics, it is immaterial 
whether we analyse correlations or co variances. The 
loadings obtained in the latter case are exactly the same 
except, of course, that each is multiplied by the appropriate 
standard deviation. 

At this point one is reminded of Lawley's loadings* 
found by the method of maximum likelihood, for these 
possess the property that the unrotated loadings obtained 
from correlations are already the same as the unrotated 
loadings obtained from covariances, if the latter are 
divided by the standard deviations. Centroid analyses, 
or principal component analyses, do not possess this 
property. The loadings obtained by these means from 
covariances cannot be simply divided by the standard 
deviations to give the loadings derived from correlations, 
though the one can be rotated into the other. Lawley's 
loadings need no such rotation. They are, as it were, at 
once of the same shape whether from covariances or from 
correlations and only need an adjustment of units, such as 
one makes in changing, say, from yards to feet. A field 
which is 50 yards broad and 20 polos long has the same 
shape as one which is 150 feet broad and 330 feet long. 
(American readers may need I do not know to be told 
that a pole equals 5| yards.) 

Now, as we have seen, this property of equivalence of 
covariance and correlation loadings is also possessed by 
simple structure. It would thus not be unnatural to hope 
that Lawley's method might lead straight to simple 

* In accordance with our definition on page 272, the term " load- 
ing " means a coefficient in a specification equation, an entry in a 
44 pattern." In the present chapter it is used throughout and is 
strictly correct when the axes referred to are orthogonal. If the 
axes are oblique, then much of what is said really refers to the items 
in a structure, not in a pattern ; but the word " loading " is still used 
to avoid circumlocutions, and because the structure of the reference 
vectors is, except for a diagonal matrix multiplier, identical with the 
pattern of the factors. 



SOME FUNDAMENTAL QUESTIONS 835 

structure, without any rotation. But this does not seem 
to be the case. If we take the known simple structure 
loadings of a set of correlations as trial loadings in Lawley's 
method, they ought to come out unchanged from his 
calculations, if they are the goal towards which those 
calculations converge ; but they don't. Clearly, then, 
simple structure is not the only position of the axes where 
the loadings are independent of the units of measurement 
employed. Indeed, any subsequent post-multiplication of 
both the simple structure tables both that from corre- 
lations and that from covariances by the same rotating 
matrix will leave their equivalence with regard to units 
unharmed. Simple structure is only one of an infinite 
number of positions which possess this property. But 
it is an easily identifiable one. 

It is difficult to keep one's mind clear as to the meaning 
of this. Let me recapitulate. There are some processes 
of analysis which, while they give a perfect analysis in the 
sense of one which reproduces the correlations (or the Co- 
variances) exactly, do not give the same analysis for the 
correlations as for the covariances. The factors they 
arrive at depend upon the units of measurement employed 
in the tests. Such, for example, are the principal compon- 
ents process and the centroid process. Such processes 
cannot be relied on to give, straight away and without 
rotation, factors which can be called objective and scien- 
tific. Some processes, on the other hand, do give analyses 
which are independent of the units. One such is Lawley's, 
based on maximum likelihood. Another is Thurstone's 
simple-structure process, which, though it begins by using 
a centroid analysis, follows this by rotation of a certain 
kind. 

But the principle of independence of units does not 
distinguish between these processes, which both satisfy it. 
Still less does it distinguish between systems of factors. 
For any one of the infinite number of such systems which 
can be got from either simple structure or Lawley's factors 
by rotation equally satisfies the principle. Indeed, there 
can really be no talk of a system of factors satisfying the 
principle. Any table of loadings whatever, obtained from 
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correlations, has, of course, corresponding to it a system 
differing only in that the rows are multiplied by coefficients, 
a system which would correspond with covariances. 
The fact that no one has discovered a process which gives 
both is irrelevant. The argument is rather as follows. If 
a worker believes that he has found a process which gives 
the true psychological factors, then that process must be 
independent of the metric, and simple structure and 
maximum likelihood are both thus independent, though 
they do not, alas, agree. Nor must it be forgotten that 
analyses from correlations are in no way superior to those 
from covariances. Indeed, correlations are covariances, 
dependent upon as arbitrary a choice of units namely 
standard deviations as any other. But centroid axes 
in themselves, or principal components, without rotation, 
are clearly inadmissible, for they change with the units 
used. The chance that such axes are the true ones is 
infinitesimal, being dependent on the chance composition 
of the battery, and the system of units which chances to 
be used. Independence of metric is not sufficient to 
validate a process, but it is necessary. Its absence does 
not prove a system of factors to be wrong, bat it makes it 
certain that the process by which they have been arrived 
at does not in general give the true factors. 

3. Specifics. These form a fundamental problem in 
factorial analysis, and yet they are practically never heard 
of in discussions of an analysis. They are considered in 
Chapter VIII, where it is pointed out that although it is 
reasonable enough to think that a test may require some 
trick of the intellect peculiar to itself, yet it is not obvious 
that these specific factors must be made as large and 
important as possible ; and that is what the plan of 
minimizing the rank of a matrix does. The excess of 
factors over tests, which inevitably, of course, results from 
postulating a specific in every test, means that the factors 
cannot be estimated with any great accuracy. Usually 
the accuracy is very low indeed. The determinate and the 
indeterminate parts of each of Thurstone's factors in 
Primary Mental Abilities can be found by post-multiplying 
Table 7 on his page 98 by Table 3 on his page 96. We find : 
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Variance of the Variance of the 

* actor Estimated Part Indeterminate Part 

S . . . -611 -389 

P . . . -616 -384 

N . . . -825 -175 

V . . . -662 -338 

M . . . -431 -569 

W . . . -439 -561 

I . . . -397 -603 

R . . . -600 -400 

D . . . -519 -481 

In three cases less than half of the factor variance has 
been estimated. The average for the nine factors is 56| 
per cent, of the variance estimated. In other words, the 
factor estimates have large probable errors, in some cases 
as large as the estimates themselves. This has serious 
consequences, not to be overcome by more reliable tests. 

To anyone pondering this, it seems somewhat para- 
doxical to be told that, if the battery of tests is large, the 
guess we make for the communalities does not much 
matter. No great change will result in the centroid 
loadings if unity is employed as communality, i.e. if no 
specifics are assumed to exist. It would seem, then, that 
their presence or absence is immaterial ! But is that so ? 

It is true that the larger the battery, the less the inde- 
terminacy. But Thurstono's battery above was as large 
as any is ever likely to be. Using unity for every diagonal 
element in the matrix of such a battery will give factors 
(supposing the same number of them to be taken out) 
which will not imitate the correlations quite so well, but 
which can be estimated accurately. 

In fact, whether Hotelling's process or the centroid 
process is used, with unit communalities, each factor can 
be calculated exactly for a man, given his scores. By 
exactly we mean that they are as accurate as his scores are. 
Of course, in any psychological experiment the scores may 
not be accurate in the sense that they can be exactly 
reproduced by a repetition of the experiment. Apart from 
sheer blunders and clerical errors, there is the fact that a 
man's performance fluctuates from day to day. But 
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these errors are common to any process of calculation 
which may be used on the scores. These are not the errors 
for which we are criticizing estimates of a man's factors. 
The point we are making is that factors based on com- 
munalities less than unity have a further, and large, error 
of estimation, whereas factors based on unit communalities 
(even if only one or two or a few are taken out) have no 
such further error of estimation (see page 79). 

If a few such factors taken out with unit communalities 
are then rotated (keeping them in the same space, i.e. not 
changing their number) they still remain susceptible of 
exact estimation in a man. 

But the reader may protest that he does not want these 
factors, for they have no psychological meaning. Better 
inexact estimates of the proper factors than exact estimates 
of the wrong ones. It may be retorted that there is no 
unanimity about which are the proper ones. It is for each 
to say what he thinks most desirable. If he is content 
with factors which are in the same space as the tests, he 
can estimate them exactly, and he can rotate them too 
if he keeps them in that space. But if he rotates them out 
of it, to make them more real, let him remember that he 
loses on the roundabouts exactly what he gains upon the 
swings, for he can only measure their projections on the 
test space. Such factors are like trees and houses to a 
man who is doomed only to see the ground. He can 
measure the length of a tree's shadow, but not its height. 
He must remain content with shadows. And so with 
factors. We must cither remain content with factors 
which are within the test space, or must reconcile ourselves 
to measuring only the shadows of factors outside that 
space. If with twenty tests we postulate twenty-five 
factors (five general and the twenty specifics), then we are 
operating in a space five of whose dimensions we know 
nothing about, and our factor estimates suffer accordingly. 
The use of communalities enables us to imitate correlations 
rather better, but dooms us to estimate factors badly. 

Such is the argument against communalities. For 
them is the hope that some day, despite their drawbacks, 
the factors they lead to may prove to be something real, 



SOME FUNDAMENTAL QUESTIONS 339 

perhaps have some physiological basis. Their defender 
may plead that the estimates of these factors are as good as 
the estimates we find useful, in predicting educational or 
occupational efficiency. In the table given above, of the 
variances of the estimated parts of Thurstone's factors, it 
is the square roots of those variances which give the 
correlations of estimates with " fact " (though " fact " 
here is an elusive ghost of a thing). Those correlations 
do not look so bad. That for N is -908, and the worst is 
630. 

4. Oblique factors. I think it is pretty certain that 
Thurstone took to oblique factors because he wants simple 
structure at all costs. Certainly oblique factors make it 
much easier to reach simple structure too easy, Reyburn 
and Taylor say. It will be found far more often than it 
really exists, they add. On the other hand, Thurstone 
can point to his box example and his trapezium example 
and say with truth that simple structure enabled him 
to find " realities," can say that the oblique simple struc- 
ture is something more real, in the ordinary common-sense 
everyday use of the word, than the orthogonal second- 
order factors which are an alternative. 

Other workers, not at all wedded to the ideas of simple 
structure, have also declared their belief in oblique factors, 
e.g. Raymond Cattell, and, I think, many who feel inclined 
to work in terms of " clusters." In ordinary life, weight 
and height are both measures of something real, although 
they are correlated. We could analyse them into two 
uncorrelated factors a and &, or into three for that matter, 
but certainly no one would use these in ordinary life. It 
is, however, just conceivable that some pair of hormones 
(say) might be found which corresponded, not one of them 
to height and one to weight, but one to orthogonal factor 
a and another to orthogonal factor b. It is far too early 
to state anything more than a preference for orthogonal 
or oblique factors. Opinion is turning, I think, toward 
the acceptance of the latter. 
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ADDENDUM (1945) TO PAGE 19 
Bif actor Analysis 

THE following example will illustrate some of the points of 
this method. Consider these correlations, which to save 
space are printed without their decimal points : 
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There are two stages in a bif actor analysis. The first 
problem is to decide how to group the tests so that those 
are brought together which share a second or group factor. 
Then the best method of calculating is needed to find the 
loadings. 

The grouping can partly be done subjectively by con- 
sidering the nature of each test and putting together 
memory tests, or tests involving number, and so on. 
Holzinger uses a " coefficient of belonging," B, to determine 
the coherence of a group. B is equal to the average of the 
intercorrelations of the group divided by their average 
correlation with the other tests in the battery. The higher 
B is, the more the group is distinguishable as a group. 
He begins with a pair of tests which correlate highly with 
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one another, and finds their B. Then he adds a third test 
and finds the B of the three. Then another and another, 
until B drops too low. There is no fixed threshold for B, 
but a rather sudden drop would indicate the end of a 
group. 

Another plan is to make a graph or profile of each row 
of correlations and compare these (Tryon, 1939), grouping 
together those tests with similar profiles. I find it easier 
to consider only the peaks of each row and compare the 
rows with regard to these. If we mark, in each row of the 
above, the five highest correlations in that row, and also 
the diagonal cell, we get the following set of peaks : 

8 9 10 11 12 

X X 

XX X 

XXX 
X 

XXX 
X 

XXX 

XX X 

X X ? 

XXX 

XXX 

XX X 

We then see that, in the rows, 

(a) Tests 3, 7, 10, 11 have identical peaks, 

(b) 2, 8, 12 

(c) ,, 4, 6 ,, ,, 

and we take these as nuclei for three groups. There re- 
main Tests 1, 5, and 9. Their average correlations with 
each of the above nuclei are : 
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We therefore add Test 1 to group c, Test 5 to group a, 
and (less certainly) Test 9 to group b. We then rewrite 
our matrix of correlations with the tests thus grouped : 

357 10 11 289 12 146 
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4-07 







It will be seen that certain additions have been made in 
readiness for the various methods of calculation of the g 
loadings which are then possible. If we symbolize the 
above table as 



A 
D 
E 



D 
B 
F 



E 
F 
C 



all methods depend on using only the correlations in the 
rectangles D, E, and F, since the suspected group factors 
which increase the correlations in A, in B, and in C do not 
influence D, E, and F. Each correlation in the latter 
rectangles is therefore the product of two ^-saturations 
(see page 9). Thus : 

r<11 = -40 = l^ 



r ia -57 = 
-40 x -34 



*' " ' 49 
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where it should be noted that the three correlations come 
from E, D, and F respectively. 

But this value for the loading of Test 3 depends upon three 
correlations only and would, in a real experimental set of 
data, vary somewhat with our choice of the three. A 
method of using all the possible correlations in these three 
rectangles is needed. One such is given by Holzinger in 
his Manual (1987a). 

If all possible ways of choosing the two other tests are 

taken, and the fraction 3t 3j formed in each case ; and if 

r v 
the numerators of these fractions are added together to 

form a global numerator, and their denominators to form 
a global denominator ; it will then be found that the 
fraction thus formed is equal to 

1-14 X -85 



and this time all available correlations have been used. 
The rule is to multiply the two totals in the row of the 
test (1*14 X -85) and divide by the grand total of the 
block formed by the other tests concerned (1, 4, and 6 
with 2, 8, 9, and 12, i.e. 4-07). For Test 2 this rule gives 

1-86 x 1-21 

= ~^4^r = ' 49 > /2 = - 70 - 

This Holzinger method is not difficult to extend to four 
or more groups. If we symbolize a four-group matrix by 

A D E G 

D B F H 

E F C K 

G H K L 

and consider the first test, then its g-loading I is given by 

/a _ <k + dg + eg 
1 ~ F + H + K 

where d, e, and g are the sums of its row in D, E, and G. 



ADDENDA 347 

Another method is given by Burt (1940, 478). For the 
numerator of each g loading he takes the sum of the side 
totals which Holzinger multiplied. Thus the numerators 
are : 

for Test 3, 1-14 + -85 = 1-99 
5, 1-78 + 1-32 = 3-10 



2, 1-86 + 1-21 = 3-07 
12, 1-09 + -72 '= 1-81 



6, 1-46 + 1*30 = 2-76. 

The denominators differ in group a, group fe, and group c, 
but all are formed from the three quantities 6-24, 4-62, 
and 4-07. For group a the denominator is : 

6-24 



It will be seen that the two quantities within the curly 
brackets are the totals of D and E, the two rectangles 
from which the numerators of group a come. By analogy 
the reader can write down the denominators of group b 
and group c they come to 4-40 and 5-01. Dividing the 
numerators by the appropriate denominators, we get for 
the g loadings : 

Test 357 10 11 289 12 146 

g Loading -49 -76 -24 -62 -55 '70 -33 '90 -41 -82 -36 -55 

The proof of Burt's formula is surprisingly easy. If the 
reader will write down, in place of the correlations in D, 
E, and F, the literal symbols lj, k (for r ik ) since our 
hypothesis is that only g is concerned in these correlations 
and will write out the sums, etc., of the above calculation 
literally, he will find that Burt's formula simplifies almost 
immediately to one 1 9 that of the test in question. Burt 
only gives his formula for three groups. It can be extended 
to the case of more groups, but becomes cumbersome and 
rather unwieldy, 
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Now comes the test of whether our grouping is correct, 
and our hypothesis valid that groups a, 6, and c have 
nothing in common but the factor g. Using the loadings 
we have found, form all the products lj, k and subtract 
them from the experimental correlations. All the corre- 
lations in D, E, and F should then vanish or, in a real set 
of data (ours are artificial) become insignificant. There 
should, however, remain residues in A, B, and C due to the 
second factors running through groups a, b, and c respect- 
ively. In our example the subtraction of the quantities 
l % l k gives the following : 





357 10 11 289 12 146 


g Loadings 49 76 24 62 55 70 33 90 41 82 36 55 
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The correlations left in A, if they are due to only one 
other factor (now that g has been removed) ought to show 
zero or very small tetrads ; and so they do. Those in B 
are also hierarchical. Those in C are too few to form a 
tetrad. The second factor in each of these submatrices 
can now be found in the same way as g is found from a 
matrix with no other factor : see page 9 and, later in this 
book, pages 153 to 155. The reader should complete the 
Calculation, and will find these loadings ; 
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Factors 
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An actual set of data will not give so perfect a hollow 
staircase, but at this stage the strict bifactor hypothesis 
can be departed from and additional small loadings or 
further factors added to perfect the analysis. Where a 
bifactor pattern exists, a simple method of extracting 
correlated or oblique factors has been given by Holzinger 
(1944) " based on the idea that the centroid pattern 
coefficients for the sections of approximately unit rank 
may be interpreted as structure values for the entire 
matrix." 

Cluster analysis is connected with the bifactor method, 
which is possible when clusters do not overlap. But it is 
by no means rare to find two or three variables entering 
into several distinct clusters. Raymond CattelPs article 
(1944a) describes four methods of determining clusters, and 
gives references which will lead the interested reader back 
to much of the previous work. See also the later part of 
our Chapter XVIII, where Reyburn and Taylor's method is 
described, and see also Tryon's work Cluster Analysis, 1939. 
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ADDENDUM (1945) TO PAGE 92 
Variances and Covariances of Regression Coefficients 

A METHOD of calculating regression coefficients is described 
on pages 89 to 93. A somewhat longer method has two 
advantages : it permits the easy calculation of regression 
coefficients for any criterion (or many) when once the main 
part of the computation is completed, and, what is of great 
importance, it enables the probable errors of the coefficients, 
and of their differences, to be found quickly. 

Before describing it, a note about page 93 may be 
useful. The '720 in slab B is the weight for Test 1 when it 
alone is used ; the weights -611 and -158 in slab C are for 
Tests 1 and 2 when they alone form the battery ; -582, 
153, and -066 are for a battery of Tests 1, 2, and 3 ; and 
finally the bottom row gives the weights for all four tests. 

The method referred to in the first paragraph above is 
to find first of all the reciprocal of the matrix of correla- 
tions of the tests. The way to do this is described on 
page 190 and will be better understood from this present 
example. There is only the one kind of computation 
throughout, viz. the evaluation of tetrad-differences 
involving the pivot, which is the number in the top left- 
hand corner of each slab. The reciprocal matrix appears 
at the bottom, (and smaller ones on the way down). 

The check column is sometimes not properly used. 
The check consists in seeing that the sum of the row is 
identical with the tetrad. Thus -177 is the sum of its 
row, and it is also the tetrad 

1 X 1-26 - -69 X 1-57 = -177. 

The reader will see that space could be saved in the 
calculation opposite by omitting the rows containing ones 
only ; and also that nearly half the numbers can be written 
down from symmetry. 

After the reciprocal matrix has bee>n found it should be 
checked to see that its product with the original matrix 
gives the unit matrix (see page 191). 

The regression coefficients for any criterion are then 
obtained by multiplying the rows of the reciprocal by the 
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criterion correlations and then adding the columns. In 
the example of page 91-3 we multiply the first row of the 
reciprocal by -72, the second by -58, and so on. The 
addition of the columns then gives the same regression 
coefficients as were found on page 93. 
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The most important advantage of this method is that 
whatever the criterion, the variances and covariances of 
the regression coefficients are proportional to the cells of 
the above reciprocal matrix (Thomson, 1940, 16 ; Fisher, 
1925, 15 and 1922, 611). Their absolute values for any 
given criterion are obtained by multiplying by 1 - r 2 m , 
the defect of the square of the multiple correlation from 
unity, and dividing by the number of " degrees of free- 
dom " which is for full correlations N p 1 where N is 
the number of persons tested, and p the number of tests. 
For partial correlations the degrees of freedom are reduced 
by the number of variables " partialled out." 

Thus in our example, where p = 4, if N had been 105, 
N _ p __ i would be 100. The multiple correlation was 
83, and 1 - r z m = -312 (see page 94). The variances and 
covariances of our four regression coefficients are in this 
case equal to the reciprocal matrix multiplied by -00312. 

0075 -0042 0016 -0017 

0042 -0061 0004 -0006 

0016 0004 -0042 0004 

0017 -0006 0004 -0038 

The standard errors of the regression coefficients are the 
square roots of the diagonal elements : 

Regression coefficients -390 -222 -018 -431 
Standard errors -087 -078 -065 -062 

Significant? Yes ? No Yes 

The correlations of the regression coefficients will be got 
by dividing each row and column by the square root of 
the diagonal element. We obtain : 

1-00 62 28 31 

62 1-00 79 -12 

28 -79 1-00 -10 

31 -12 -10 1-00 

We can now calculate the standard error of the difference 
between any pair of the regression coefficients and see 
whether they differ significantly. Take, for example, those 
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for Test 1 (-390) and Test 4 (-431). The difference is -041. 
Its standard error is the square root of 

-0075 + -0038 + 2 X -31 X -087 X -062 == -01 4G. 
.*. standard error of '041 is -121. 

The difference is therefore not significant when N = 105. 
Had N been larger it might have been. 



ADDENDUM TO CHAPTER IV ON THE GEOMETRICAL 
PICTURE 

SHORTLY after the publication of the first edition, Mr. 
Babington Smith pointed out to me a difficulty which a 
reader might experience in understanding this chapter, 
if he had previously read about an apparently different 
space, in which there are as many orthogonal axes as there 
are persons. 

In this space I may call it Wilson's a test is repre- 
sented by a point whose co-ordinates are the scores of the 
persons in that test. If the scores have been normalized 
(see footnote, page 6), then the distance from the origin 
to the test-point will be unity. The test-points will, in 
fact, be exactly the same points as those spoken of on page 
63, where unit distance was measured along each test 
vector in the positive direction. The space of Chapter 
IV is the same as the space defined by these points, a 
subspace of the larger space. The latter has as many 
dimensions as there are persons, the former as there are 
tests. 

The lines joining the origin to the test-points are the 
same lines as the test vectors of Chapter IV, and the 
cosines of the angles between them represent here, as 
there, the correlations between the tests : for this follows 
from the " cosine law " of solid or multidimensional 
geometry, since the normalized scores of the tests are 
direction cosines with regard to the axes equal in number 
to the persons. 

In this subspace I introduced the idea of each point 
representing a person; whose scores are the projections of 

F.A. 12 
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that point on to the test vectors. Apparently this was an 
unfamiliar idea to some, in such a space, although I thought 
it was in common use. It had been used by Hotelling in 
his space (see Chapter V), where the test lines are ortho- 
gonal ; and he contemplates that space being squeezed 
and stretched into the space of Chapter IV (Hotelling, 
1933, 428) and refers to Thurstone's previous use of this 
subspace. Moreover, since writing the above I have no- 
ticed that Maxwell Garnett in 1919 used the representation 
of persons by points in the test space (J5.J.JP., 9, 348). And 
anyhow the ordinary scatter-diagram does so, the diagram 
commonly made by drawing the test lines at right angles 
and using his scores as a person's co-ordinates. If in such 
an ordinary scatter-diagram the test axes are then moved 
towards one another until the cosine of the angle represents 
the correlation, the elliptical crowd of dots will become 
circular if standard scores have been used. (It is to be 
noted that although the axes have ceased to be otho- 
gonal, it is still the vertical projections on to each line 
which represent a person's scores, not the oblique axes). 
For Sheppard showed in 1898 (Phil. Trans. Roy. Soc., 
A 192, page 101) that r = cos {bir/(a + b)\ where the 
scatter-diagram, with its axes drawn through the means, 
has the quadrant-frequencies 

b 



It is not my experience that unsophisticated persons 
find difficulty with page 55 ; and as for the sophisticated, 
well, they oughtn't to. 
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FRANCIS F. MEDLAND (Pmka 1947, 12, 101-10) has tried 
nine methods of estimating communality, on a correlation 
matrix with 63 variables. A method entitled Centroid 
No. 1 method seemed to be best. A sub-group is chosen 
of from three to five tests which correlate most highly with 
the test whose communality is wanted. The highest cor- 
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relation t in each column of the sub-group is inserted in 
the diagonal cell, and the columns summed. The grand 
total is also found. Then the estimate of h\ is 

(Zri+ t,Y 
Zr + Zt 

where the numerator is the square of the column total, 
and the denominator is the grand total. Thus if the cor- 
relations of the sub-group were 

(72) -72 -63 -24 

72 (-72) -47 -59 

63 -47 (-63) -41 

24 -59 -41 (-59) 

2-31 2-50 2-14 1-83 = 8-78 
the estimate of }i{ would be 

2-31 2 

- -608 
8-78 

Clearly the same sub-group will usually serve for more than 
one of its members. Thus from the above example h\ 
can be estimated to be -712. 

A graphical method, for which the reader is referred to 
Medland's article, was about equally accurate but rather 
more laborious. 

BURT ROSNER (Pmka 1948, 13, 181-4) has given an alge- 
braic solution for the comnumalities depending upon the 
Cayley-IIamilton theorem that any square matrix satisfies 
its own characteristic equation, but adds that the method 
" is not at all suited for practical purposes. The com- 
putational labour is prohibitive." It is however interest- 
ing theoretically and may suggest new advances. 
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PARAGRAPHS 

1. Textbooks on matrix algebra. 2. Matrix notation. 
3. Spearman's Theory of Two Factors. 4. Multiple common 
factors. 5. Orthogonal rotations. 6. Orthogonal transforma- 
tion from the two -factor equations to the sampling equations. 
7. Hotelling's "principal components." 8. The pooling square. 
9. The regression equation. 9a. Relations between two sets 
of variates. 10. Regression estimates of factors. 11. Direct 
and indirect vocational advice. 12. Computation methods. 
13. Bartlett's estimates of factors. 14. Indeterminacy. 15. 
Finding g saturations from an imperfectly hierarchial battery. 
16. Sampling errors of tetrad-differences. 17. Selection from 
a multivariate normal population. 17 a. Maximum likelihood 
estimation (by D. N. Lawlcy). 18. Reciprocity of loadings and 
factors in persons and traits. 39. Oblique factors. Structure 
and pattern. 19a. Second-order factors. 20. Boundary con- 
ditions. 21. The sampling of bonds. 

1. Textbooks on matrix algebra. Some knowledge of 
matrix algebra is assumed, such as can be gained from the 
mathematical introduction to L. L. Thurstone's The Vectors 
of Mind (Chicago, 1935) ; Turnbull and Aitken's Theory 
of Canonical Matrices, Chapter I (London and Glasgow, 
1932) ; H. W, Turnbull's The Theory of Determinants, 
Matrices, and Invariants, Chapters I-V (London and 
Glasgow, 1929) ; and M. BScher's Introduction to Higher 
Algebra, Chapters II, V, and VI (New York, 1936). 

I have adopted Thurstone's notation in sections 19 
and 19a of the mathematical appendix, and in Chapters 
XVIII and XIX in describing his work. But I have not 
made the change elsewhere because readers would then be 
incommoded in consulting my own former papers. 

The chief differences are as follows : 

My M is Thurstone's F, for centroid factors, my Z is 
Thurstone's S -f- ^/N, and my F is Thurstone's P ~ yTST. 

359 
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2. Matrix notation. Let X be the matrix of raw scores 
of p persons in n tests, with n rows and p columns ; and 
when normalized by rows, let it be denoted by Z. The 
letters z and Z in the text of this book mean standardized 
scores, which are used in practical work, but in this 
appendix they mean normalized scores, so that 

ZZ' = R . . . (1) 

the matrix of correlations between n tests. 

For many purposes it is convenient to think of solid 
matrices like Z as column (or row) vectors of which each 
element represents a row (or column). Thus Z can be 
thought of as a column vector z, of which each element 
represents in a collapsed form a row of test scores. Thus 
with three tests and four persons 



z = 



= Z. (2) 



In the theory of mental factors each score is represented 
as a loaded sum of the normalized factors /, the loadings 
being different for each test, i.e. 

z = Mf (specification equations) (3) 

where M is the matrix of loadings, and f the vector of v 
factors, collapsed into a column from F, the full matrix, 
of dimensions v X p. 

We note that p number of persons, 
n = number of tests, 
v = number of factors. 

The dimensions of M are n x v. Equation (3) represents 
n simultaneous equations, and the form Z MF represents 
np simultaneous equations. 
We now have 

R = ZZ' = (MF)(MF)' = MFF'M . (4) 

If the factors are orthogonal, we have 

FF' = I (5) 

the unit matrix, and therefore 

R = MM' (6) 

The resemblance in shape between this and 

(1) 
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leads to a parallelism between formulae concerning persons 
and factors (Thomson, 19356, 75 ; Mackie, 1928a, 74, and 
1929, 34). 

3. Spearman's Theory of Two Factors assumes that M 
is of the special form 



M = 



L . ra, 



(7) 



and therefore 

R = U' + MS .... (8) 

where M l is the diagonal matrix which forms the right- 
hand end of M, and I is the first column of M. In this 
form it is clear that R is of rank 1 except for its principal 
diagonal. Its component II is the " reduced correlational 
matrix " of the Spearman case, and is entirely of rank 1. 
The elements Z^, Z 2 2 , . . . Z M 2 , which form the principal 
diagonal of ZZ', are called " commonalities." 

4. Multiple common factors. When more than one 
common factor is present, M takes the form 

M = (M,\M 1 ) .... (9) 

where M Q is the matrix of loadings of the common factors, 
represented in the Spearman case by the simple column Z. 
We have then 

R = MM 1 = MM* + MS . . . (10) 

where the " reduced correlation matrix " M M ' is of 
rank r, the number of common factors, and is identical 
with R except for having " communalities " in its principal 
diagonal. 

5. Orthogonal rotations. -If we express the v factors / in 
terms of w new factors 9 by the equation 



where A is a matrix of v rows and w columns, we have 

z = Mf=:MA<p . . . (12) 

an expression of the tests z as linear loaded sums of a 
different set of factors, with a matrix of loadings MA. 
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If 

AA' = 1 (13) 

the new factors 9 are orthogonal like the old ones. They 
can be as numerous as we like, but not less than the number 
of tests unless the matrix R is singular. (12) represents a 
rigid rotation of the orthogonal axes / into new positions, 
with dimensions added or abolished. 

6. The sampling theory. The following transformation 
is of interest as showing the connexion between the 
Theory of Two Factors and the Sampling Theory (Thom- 
son, 19356, 85). We shall write it out for three tests only, 
but it is quite general. Consider the orthogonal matrix : 



III 


mil Iml Urn 


mml mlm Imm 


mmm 


mil 
Iml 
Urn 


Ill mml mlm 
mml III Imm 
mlm Imm III 


Iml llm mmm 
mil mmm llm 
mmm mil Iml 


Imm 
mlm 
mml 


mml 
mlm 
Imm 


Iml mil mmm 
llm mmm - mil 
mmm llm Iml 


Ul Imm mlm 
Imm III mml 
mlm mml Ul 


llm 
Iml 
mil 


mmm 


Imm mlm mml 


llm Iml mil 


-III 



(14) 



wherein the omitted subscripts 1, 2, and 3 are to be 
understood as existing always in that order, so that mil 
means m^a* 

If we take for A in Equation (12) the first four rows 
of this orthogonal matrix, and for M the Spearman form 
(7) with three tests, the result is to transfer to eight new 
factors, yielding : 



4- 

~s = lA<Pi -I- Wj/a^a 4- ZiWia^s 4- 
Each z is here in normalized units. If, however, we 
change to new units by multiplying the three equations 
by Zj, Z 2 , and Z 3 respectively, we have : 

l i z i = ZiVa^i 4- liMzkVs 4- hh m 'A9i 4- /iW 2 w 3 <p 7 
I 2 z 2 = IJIJ.^ 4- m 1 ; 2 f 3 97 2 4- liltfnjpi 4- m^m^ . . (10) 
*3*3 = W*<Pi 4- mAh<Pz ,4- Ii>n z l 3 (p 3 4- ^m^^ 
and the variates Z^, Z 2 s 2 , and Z 3 2 3 are now susceptible of 
the explanation that each is composed of l^N small equal 
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components drawn at random from a pool of N such 
components, all-or-none in nature. In that case 1^1 2 2 1 9 2 N 
components would probably appear in all three drawings 
(?i) ; l-fl^m^N components would probably appear in the 
first two drawings, but not in the third (94) ; and so on 
down to m^m^m^ components, which would not appear 
at all (93, which is missing from the equations). 

The transformation can, of course, be reversed, and the 
sampling theory equations converted into the two-factor 
equations. 

7. Hotelling s " principal components " are the principal 
axes of the ellipsoids of equal density 

z'R" l z - constant .... (17) 

^vhcn the test vectors are orthogonal axes (Hotelling, 1933). 
To find the principal axes involves finding the latent 
roots of R~ l . The Hotelling process consists of (a) a 
rotation of the axes from the orthogonal test axes to the 
directions of the principal axes ; and (b) a set of strains 
and stresses along these new axes to standardize the factors, 
making the ellipsoid spherical and the original axes oblique. 
The transformation from the tests to the Hotelling factors 
y being from Equation (3) 

z = My (M square) 
the ellipsoids (17) become 

constant - z'R *z = y / (M / JB" 1 M)Y = y'y . (18) 
since they become spheres. Therefore we must have 

M'R~ 1 M = I . . (19) 

The locus of the mid points of chords of z'R~ l z whose 
direction cosines are h' is the plane h'R~ l z = 0, and if this 
is a principal plane it is at right angles to the chords it 
bisects, i.e. 

h'R~ l = W 

which has non-trivial solutions only for 
| fi-i - X/ | = 

the roots X of which are the " latent roots " of J?" 1 , while 
each h' is a " latent vector." 
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Now, if H is the matrix of normalized latent vectors of 
J?~ l , we have 

H'R~ 1 H = A 

where A is the diagonal matrix of the latent roots of .B" 1 ; 
so that a solution for M corresponding to rotation to the 
principal axes and subsequent change of units to give a 
sphere is seen to be 

M = //A~* . . (20) 

The latent vectors of R are the same as those of R~ l , 
or of any power of JK, and Hotelling's process described 
in the text (Chapter V) finds the latent roots (forming the 
diagonal matrix D) and the latent vectors (forming H) of 
R. We then have 

M == HD* . . . (21) 

For the convergence of the process, see Hotelling's paper 
of 1933, pages 14 and 15. 

Since in Hotelling analyses M is square, we can write 

y = M~ l z = (HD*)- l z 
- D-*H~ l z - D~ l (IflH')z - D~ l M'z . (22) 

Each factor y, that is, can be found from a column of 
the matrix M, divided by the corresponding latent root, 
used as loadings of the test scores z. 

8. The pooling square. If the matrix of correlations of 
a + b variates is : 

^~ (23) 



and if the standardized variates a are multiplied by weights 
u, the standardized variates b by weights w, and each set 
of scores summed to make two composite scores, the 
resulting variances and covariances are : 

u'R aa u 



'aa" 



u'R^w 

- . . (24) 

w'R bb w 

as can be seen by writing out the latter expressions at 
length. The battery intercorrelation is therefore 

u'Rdtx>_ we w'RfrU 
<\/(u'R aa u x w'R bb w) 
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If weights are applied to raw scores, each applied weight 
must be multiplied by each pre-existing standard deviation, 
in (25). 

If there is only one variate in the a team, (25) becomes 



where r ba represents a whole column of correlation coeffi- 
cients. The values of w for which this reaches its maximum 
value will satisfy the equation 

8 w f r ba 

that is 

w = a scalar X ^w>~V &a . . (28) 

consistent with the ordinary method of deducing regression 
coefficients. 

9. The regression equation. If z is the one variate in 
the a team, and z are the b team, and if 

*.='*. . . . (29) 

we wish to make S(z z ) 2 a minimum, that is 

8w ^ ' ~~ 

Szz' = w'Szz' 

Z Q ^ rll'R b bb ~ l z . . . (30) 

If R is the matrix of correlations of all the tests including 
z , the regression estimate of any one of the tests from a 
weighted sum of the others is given by 

determinant R z = (31) 

where R z is R with the row corresponding to the variate 
to be estimated replaced by the row of variates. 

9a. Relations between two sets of variates. (Hotelling 
1935a, 1936, M. S. Bartlett 1948). If two sets of variates 
have correlation coefficients 

Rab A 

or 



R 



'ba 



B 



and if the variates of the B team are fitted with weights b, 

F.A. 13 
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then the correlations of the B team, thus weighted, with 
the separate tests of the A team are given by 

C'b 



and the square of the correlation coefficient between the 
two teams is then 



The maximum intercorrelation, and other points of in- 
flexion in A, will be given by 

dX/db = 
i.e. (CA~*C' - XB)b = . . . (31.3) 

a set of homogeneous equations in b. We must therefore 
have 

| CA-*C' - XB | = . . . (31.4) 

an equation for A with as many non-zero roots as the num- 
ber of variates in the smaller team. For any one of these 
roots A, the weights b are proportional to the co-factors of 
any row of (CA~*C' XB). The corresponding weights 
a for the A team are then found by condensing the team B 
(using weights b) to a single variate and carrying out an 
ordinary regression calculation. 

The result is to " factorize " each team into as many 
orthogonal axes as there are variates. These axes are re- 
lated to one another in pairs corresponding to the roots A. 
Each axis is orthogonal to all the others except its own 
opposite number in the space of the other team, arising 
from the same root A as it does, to which axis it is inclined 
at an angle arccos V^T. Where one team has m more 
variates than the other, m of the roots will be zeros and 
the corresponding axes will be at right angles to the whole 
space of the other team. This form of factorizing has been 
called by M. S. Bartlett (1948) external factorizing, since 
the position of the " factors " or orthogonal axes in each 
team, in each space, is dictated by the other team. 

The weightings corresponding to the largest root give 
the closest possible correlation of the two weighted teams. 
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If the two teams are duplicate forms of the same tests, this 
is the maximum attainable battery or team reliability 
(Thomson 19406, 1947, 1948). In this case Peel (Nature, 
1947) has shown that a simpler equation than 31 4 gives 
the required roots. If A = (Ji 2 Peel's equation is 

| C \LA | = . . . (31.5) 

10. Regression estimates of factors. When in the speci- 
fications 

z = Mf . . . . (3) 

the factors outnumber the tests, they cannot be measured 
but only estimated. To all men with the same set of 
scores z will be attributed the same set of estimated factors 
/, though their u true " factors may be different. The 
regression method of estimation minimizes the squares of 
the discrepancies between f and /, summed over the men. 
The regression equation (31) will be for one factor^ 

K* p I = (32) 

I w t # I v ' 

where m t is a column of M. Expanding, we have 

/. = m/JR- 1 * 
and in general 

/-M'JT 1 * . . . (33) 

or, separating the common factors and the specifics 

f = M Q 'R- l z . . . (34) 
/! = MJt^z . . . (35) 

the latter of which shows that we know the proportionate 
weights for each specific (the rows of R~ l ) even before we 
know whether that specific exists (Wilson, 19346, 194). 
The matrix of covariances of the estimated factors is 



M' Tt~ l M 

MK M- ,,_, MJt^Mj. 



(36) 



a square idempotent matrix of order equal to the number 
of factors, but trace only equal to the number of tests. 

For one common factor, (34) reduces to Spearman's 
estimate 
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I - rb 1 ^' < 81 > 

where S = S, ** 

i - V 

while # = M Q 'R- 1 M Q in (36) reduces to S/(l + S), the 
variance of g. 

10a. Ledermann's short cut (1938a, 19396). The above 
requires the calculation of the reciprocal of the large square 
matrix R. Ledermann's short cut only requires the reci- 
procal of a matrix of order equal to the number of common 
factors. We have 

B = M M ' + M^ . . . (10) 
and the identity 

M 'Mr*(M M ' + Mi>) = (M 'Mr 2 M + I)M ' 

= (J + I)M ' say. 

Premultiplying by (/ + J)~ ] and postmultiplying by R~ l 
we reach (I + J)- 1 M^M^ = M^R' 1 . . (36-1) 

and the left-hand quantity can then be used in equation 
(34). 

11. Direct and indirect vocational advice. If Z Q is an 
occupation and z a battery of tests, the estimate of a 
candidate's occupational ability is 

= rSB^z .... (37) 
where the r are the correlations of the occupation with the 
tests. If z can be specified in terms of the common 
factors of z, and a specific s independent of z, then an 
indirect estimate of via the estimated / is possible. We 
have 

*o = Wo'/o + S ( 88 ) 

where m ' is a row of occupation loadings for the common 
factors / of z, and also 

/ - M,'R- l z 

Substitution in (38), assuming an average s (=Q) 
gives 

z = rao'Mo'tf' 1 * . . (39) 
But 

mo'Mo' = r ' .... (40) 
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and (39) is identical with (37) (Thomson, 1936&). If, how- 
ever, s is not independent of the specifics s of the battery, 
(40) will not hold, and the estimate (39) made via an estima- 
tion of the factors will not agree with the correct estimate (37). 
12. Computation methods. The " Doolittlc " method of 
computing regression coefficients is widely used in America 
(Holzinger, 1937&, 32). Aitken's method, used and 
explained in the text, is in the present author's opinion 
superior (Aitken, 1937a and b, with earlier references). 
Regression calculations and many others are all special 
cases of the evaluation of a triple matrix product XY~ 1 Z 9 
where Y is square and non-singular, and X and Z may 
be rectangular. The Aitken method writes these matrices 
down in the form 



Y 
X 



Z 



and applies pivotal condensation until all entries to the 
left of the vertical line are cleared off. All pivots must 
originate from elements of Y. By giving X and Z special 
values (including the unit matrix /) the most varied 
operations can be brought under tho one scheme (see 
Chapter VIII, Section 7). 

13. Bartletfs estimates of factors. We have z = M f + 
Mjfi, where / and / x are column vectors of the common 
and specific factors respectively and M is a diagonal 
matrix. Bartlett now makes the estimates / such as will 
minimize the sum of the squares of each person's specifics 
over the battery of tests, i.e. 

8 
gjftfi'/i) = 

>- 



(- Afr'M.y (Mr 'a - Mr l M/o) - o 

Af.'Mr'z = M 'Mr 2 M / 

= J/o, say, 

/ =--. J- l M 'M t - 9 z . (41) 
(Bartlett, 1937, 100.) 
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One could also find the estimated specifics as 

/! = (/ - Mr'MoJ-'M o'MrWr 1 * . . (42) 

Substituting 

z = [M ! M x ] 



we get for the relation between /and/ 

/ i r/ ! j-'Af.'Mr 1 1 [/] 4f 

= ' 



and for the co variances of /we get 



The error variances and covariances of the common 
factors are 



1 = J- 1 . (45) 
(Bartlett, 1937, 100.) 

When there is only one common factor, J becomes the 
familiar quantity 



(Bartlett, 1935, 200.) 
As was first noted by Ledermann * 

/ + J^ 1 = (M 'R- l M o r l - K l . (46) 
(quoted by Thomson, 1938&) ; and using this we see that 
the back estimates of the original scores from the regression 
estimates j^o are identical with the insertion of Bartlett's 
estimates / in the common-factor part of the specification 
equations, viz. 

M Q K- l M Q 'R- l z - MJ^Mo'M^z . . (47) 

(Thomson, 1938a.) 

Bartlett has pointed out that, using the same identity, in 
the form K = J(I K) 9 it is easy to establish the rever- 
sible relation between his estimates and regression esti- 
mates 



(Bartlett, 1938.) 
* Letter of October 23, 1937, to Thomson. 



MATHEMATICAL APPENDIX . 371 

and he summarizes their different interpretation and prop- 
erties by the formulae 

E{f Q } ^E{f Q \ =0, JB{(/.-/)(/.-/o)'| =I-K (49) 



- K~\I - K) . (50) 

where E denotes averaging over all persons, EI over all 
possible sets of tests (comparable with the given set in 
regard to the amount of information on the group 
factors /o). 

14. Indeterminacy. The fact that estimated factors, if 
the factors outnumber the tests, necessarily have less than 
unit variance has sometimes been expressed in the case of 
one common factor by postulating an indeterminate 
vector i whose variance completes unity. This i may be 
regarded as the usual error of estimation, and is a function 
of the specific abilities (Thomson, 19346). The fact that 
M'R~ 'M in Eqn. (36) is of rank less than its order also 
expresses the indeterminacy, and allows the factors to be 
rotated to different positions which nevertheless fulfil all 
the required conditions. In the hierarchical case the 
transformation which effects this is (Thomson, 1935a) 

/=B 9 .... (51) 

where B means the required number of rows of 

B^I-2qq'/q'q . . . (52) 

in which 

q. = l i \m i (see Equation 7) . (53) 

as far as there exist tests, after which q is arbitrary. 
For 

z == Mf = MJ?9 = M9 

since 

MB = M ..... (54) 

and z is thus expressed by identical specification equations 
in terms of new factors <p. For such transformations in the 
case of multiple factors see Thomson, 1936a, 140 ; and 
Ledermann, 1938c. 

If the matrix M is divided into the part M due to 
common factors and the part M l due to specifics, as in 
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equation (9), then Ledermann shows that if U is any 
orthogonal matrix of order equal to the number of com- 
mon factors,' the matrix 



wherein Q = 



Mrw J ' 



will satisfy the equation 

MB = M 

Indeterminacy is entirely due to the excess of factors 
over tests, i.e. to the fact that the matrix of loadings M 
is not square. It can be in theory abolished by adding 
a new test which contains no new factor, not even a new 
specific ; or a set of new tests which add fewer factors 
than their number, so that M becomes square (Thomson, 
1934c ; 1935a, 253). In the case of a hierarchy each of 
these tests singly will conform to the hierarchy, so that 
their saturations / can bo found ; but jointly they break 
the hierarchy. If they add no new factors, g can then be 
found without any indeterminacy. 

15. Finding g saturations from an imperfectly hierarchical 
battery. The Spearman formula given in Chapter IX, 
Section 5, is the most usual method. A discussion of other 
methods will be found in Burt, 1936, 283-7. See also 
Thomson, 1934a, 370, for an iterative process modified 
from Hotelling. 

16. Sampling errors of tetrad-differences. The formulae 
(16) and (16A) given in the text are both approximations, 
but appear to be very good approximations. The primary 
papers are Spearman and Holzinger, 1924 and 1925. 
Critical examination of the formulae have been made by 
Pearson and Moul (1927), and Pearson, Jeffery, and Elder- 
ton (1929). Wishart (1928) has considered a quantity P 
which is equal to P'N*/(N l)(N 2), where P' is the 
tetrad-difference of the covariances a instead of the correla- 
tions, and obtained an exact expression for the standard 
deviation cr of P 

(N - 2)v* - -- Z> 12 D 34 - D + 3J9 13 Z> 31 (55) 
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where the D's are determinants of the following matrix 
and its quadrants : 



a 21 



#13 



#33 



a 44 



But approximate assumptions are necessary when the 
standard deviation of the ordinary tetrad-difference of the 
correlations is deduced from that of JP. The result for 
the variance of the tetrad-difference is 



N + I 



2) 



(1 - )(! - r 34 2 ) - R 



(56) 



where R is the 4x4 determinant of the correlations. 

17. Selection from a multivariate normal population.- 
The primary papers are those of Karl Pearson (1902 and 
1912). The matrix form given in the text (Chapter XII, 
Section 2) is due to Aitken (1934), who employed Soper's 
device of the moment -gen era ting function, and made a 
free use of the notation and methods of matrices. A 
variant of it which is sometimes useful has been given by 
Ledermann (Thomson and Ledermann, 1988) as follows. 
If the original matrix is subdivided in any symmetrical 
manner : 



R P , R pl 

R qi R 9t 

R si R, t 

1? I? 

lt to lt tf 



and R pp is changed by selection to V pp , then each resulting 
sub-matrix, including V pp itself, is given by the formula 



where- ^ E pp = B^ - R^V^j ' ^ (57) 

IT a. Maximum likelihood estimation. The maximum 
likelihood equations for estimating factor loadings (Lawley, 



P.A. 13* 
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1940, 1941, 1943) may be expressed fairly simply in the 
notation of previous sections. It is necessary, however, 
to distinguish between the matrix of observed correla- 
tions, which we shall denote by J? , and the matrix 

R = M M ' + MS, 

which represents that part of R which is " explained " by 
the factors. 

The equations may then be written 

Mo' - Mo'/r^Ro . . . (58) 

These are not very suitable for computational work. 
It may, however, be shown that 

Mo'JfT 1 = (I - #)M 'Mr* = (I + Jr'Mo'Mr 2 (59) 
where, as before, 

K = Mo'R- l M< J = M 'Mr 2 M . 

Hence our equations may be transformed into the 
form 

Mo' = (I + J^Mc/MrX (60) 
or alternatively, 

M ' = J- a (M 'Mr 2 #o - Mo') . (61) 

When there are two or more general factors the above 
equations will have an infinite number of solutions corre- 
sponding to all the possible rotations of the factor axes. 
A unique solution may, however, be found such as to 
make J a diagonal matrix. 

Finally, if we put 

L = Mo'Mr^o - M ', 
V = LMr 2 M , 

then, from the last set of equations 

V = JM 'Mr 2 M = J 2 . 
Hence we have 

MO' == F-*L .... (62) 

These equations have been found the most convenient in 
practice, since they can be solved by an iterative process. 
When first approximations to M and Mj have been ob- 
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tained, they can be used to provide second approximations 
by substitution in the right-hand side. 

18. Reciprocity of loadings and factors in persons and 
traits (Burt, 19376). Let W be a matrix of scores centred 
both by rows and columns. Its dimensions are traits X 
persons (t . p) 9 and its rank is r where r is smaller than 
both t and p in consequence of the double centring. The 
two matrices of co variances are WW for traits and W'W 
for persons, and by a theorem first enunciated by Sylvester 
in 1883 (independently discovered by Burt), their non-zero 
latent roots are the same. If their dimensions differ, 
i.e. t 4= p, the larger one will have additional zero roots. 
Let the non-zero roots form the diagonal matrix D. Then 
the principal axes analyses are : 

W = H t D*F l9 dimensions (t . r)(r . r)(r . p) 
and W'=^ Htt*F 29 dimensions (p . r)(r . r)(r . t) 

where H j and H 2 are the latent vectors of WW and W'W 9 
while F l is the matrix of factors possessed by persons, 
F z that of factors possessed by traits. From the analysis 
of W we have, taking the transpose 

W' = JfrYZW, dimensions (p . r)(r . r)(r . t) 

and comparison of this with the former expression for W 
makes the reciprocity of // 2 and F/, F 2 and ///, evident. 

19. Oblique factors. Structure and pattern. In Thur- 
stone's notation, which we shall follow in this paragraph, 
the matrix M of our equation (3), when it refers to centroid 
factors, is called F. Our equation (3) becomes in his 
notation 

s =Fp. 

Since centroid factors are orthogonal, F is both a pattern 
and a structure. The structure is the matrix of correla- 
tions between tests and factors, i.e. : 

Structure = sp' = ( F P)P* = F (PP') = F1 = F = Pattern. 

When the factors are oblique, however, this is not the 
case. In that case, Structure = Pattern X matrix of 
correlations between the factors. 

Thurstone turns the centroid factors to a new set of 
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positionsj[( still within the common-factor space, and in 
general oblique to one another) called reference vectors. 
The rotating matrix is A, and 

V = FA . . . (63) 

is the structure on the reference vectors. The cosines of 
the angles between the reference vectors #re given by A' A. 
V is not a pattern. Its rows cannot be used as coefficients 
in equations specifying a man's scores in the tests, given 
his scores in the reference vectors. The pattern on the 
reference vectors would not have those zeros which are 
found in V. 

The primary factors are the lines of intersection of the 
hyperplanes which are at right angles to the reference 
vectors, taken (r - 1) at a time where r is the number of 
common factors, the number of dimensions in the common- 
factor space. They are defined, therefore, by the equations 
of the hyperplanes, taken (r 1) at a time. These 
equations are 

A'a? = O . . . (64) 

where a? is a column vector of co-ordinates along the 
centroid axes. The direction cosines of the intersections 
of these hyperplanes taken (r 1) at a time are therefore 
proportional to the elements in the columns of (A')~ 1 9 
and to make them into direction cosines this has to have 
its columns normalized by post-multiplication by a diagonal 
matrix D, giving for the structure on the primary factors 

F(A / )~ 1 D . . . (65) 

D is also the matrix of correlations between the reference 
vectors and the primary factors, for 

A'(A')" a J5 = D . . . (66) 

Each primary factor is therefore correlated with its own 
reference vector but orthogonal to all the others, as can 
also be easily seen geometrically. 

The matrix of intercorrelations of the primary factors 
is DA~ l (A'r l D from equation (65). 

If W is the pattern on the primary factors p, so that 

test scores s = Wp, 
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then the structure on the primary factors is also 

sp' = Wpp' 

where pp ' is the matrix of correlations between the primary 
factors, and therefore 

primary factor structure = WDA~ l (A')~ l D . . (67) 
Also, this structure = F(A')~ 1 D from (65). 

Equating these we have : 

WDA~ l = F 
whence W = FAD" 1 . . . (68) 

= VD~ l . . . (69) 
We have, therefore, 

Structure Pattern 

Reference vectors . . FA F(A')~ l ] 



. . , . 

Primary factors . . ^(A')" 1 ^ FAD" 1 ) ( ' 



where the reference -vector pattern has been entered 
by analogy but could easily be independently found. 
It will be seen that the structure and pattern of the 
primary factors are identical with the pattern and struc- 
ture of the reference vectors except for the diagonal 
matrix Z>. The structure of the one is the pattern of the 
other multiplied by D. 

This theorem is not confined to the case of simple 
structure, but is more general, and applies to any two sets 
of oblique axes with the same origin O, of which the axes 
of the one set are intersections of " primes " taken r 1 
at a time in the space of r dimensions, and the axes of the 
other set are lines perpendicular to those primes. By 
prime is meant a space of one dimension less than the whole, 
i.e. Thurstone's hyperplane. The projections of any point 
P on to the one set of axes are identical with the projections 
thereon of its oblique co-ordinates on the other set, which 
sentence is equivalent to the matrix identities (see 70) 

FA = FAD- 1 x D, 

and F(A f )- 1 D --= JF(A')' 1 X D, 



Structure | ___ Pattern on) (Cosines to project it 
on one set} ~~ other set j \ on to the first set. 
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A diagram makes this obvious in the two-dimensional case 
and gives the key to the situation. A perspective diagram 
of the three-dimensional case is not very difficult to make 
and is still more illuminating. The vector (or test) OP 
is the " resultant " of its oblique co-ordinates (the pattern), 
but not of its projections (the structure). It is of interest 
to notice that, either on the reference vectors or on the 
primary factors 

Pattern x Transpose of Structure Test-correlations. 

This serves as a useful check on calculations. It is geo- 
metrically immediately obvious. For consider a space 
defined by n oblique axes, with origin 0, and any two 
points P and Q each at unit distance from 0. The direc- 
tions OP and OQ may be taken as vectors corresponding 
to two tests, and cos POQ to the test correlation. 

Consider the pattern, on these axes, of OP, and the 
structure, on the same axes, of OQ. The former is com- 
posed of the oblique co-ordinates of the point P, the latter 
of the projections on the axes of the point Q, which pro- 
jections (OQ being unity) are cosines. Then the inner 
product of those oblique co-ordinates of P with these cosines 
obviously adds up to the projection of OP on OQ, that is 
to cos POQ, or the correlation coefficient. 

In estimating oblique factors by regression, since the 
correlations between factors and tests must be used, the 
relevant equation is 

. / ={F (A')- 1 D}'/?- 1 ^ (70-1) 
Ledermann's short cut (section 1 0a above) requires consider- 
able modification for oblique factors. We no longer have 
R = M M ' + M) 2 . . . (10) 
but 

Pattern x transpose of structure + M^ 2 R 9 
i.e. in Thurstone's notation 

(JVUJ-'HW)-^} ' + FI* = R, - (70-2) 

and using this (Thomson 1949), we reach the equation 

/ = (/ + Jr^A')- 1 ^} '*\- 2 *, . (70-3) 

where now 

J = (F (A')- 1 0}'* l r'WAD- 1 ), (70-4) 
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in place of Ledermann's J = M 'M 1 "" 2 M . 

Only reciprocals of matrices of order equal to the 
number of common factors are now required, but the 
calculation, like all concerning oblique factors, is still one 
of considerable labour. 

I9a. Second-order factors. The above primary factors 
can themselves in their turn be factorized into one, two, or 
more second-order factors, and a factor-specific for each 
primary. If the rank of the matrix of intercorreiations 
of the primaries can be reduced by diagonal entries to say 
two, then the r primaries will be replaced by r -f 2 second- 
order factors which will no longer be in the original 
common-factor space. The correlations of the primaries 
with these second-order factors will form an oblong matrix 
with its first two columns filled, but each succeeding 
column will have only one entry corresponding to a factor- 
specific, thus : 



r r 

r r 

r r 

r r 

r r 



- E (say), 



where subscripts must be supplied to indicate the primary 
(the row) and the second-order factor (the column). 

The primary factors can be thought of as added to the 
actual tests, their direction cosines being added as rows 
below jP, which thus becomes : 

F 
DA- 1 

Imagine this matrix post-multiplied by a rotating matrix 
VF, with r rows and r + 2 columns, which will give the 
correlations with the r + 2 second-order factors. The 
lower part of the resulting matrix will be E, which we 
already know. That is 

= E ... (71) 

- AD~ 1 E . . . (72) 
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and the correlations of the original tests with the second- 
order factors are then : 



G = F*F = FAD~ 1 E = VD~ 1 E . (73) 

G is both a structure and a pattern, with continuous 
columns equal in number to the general second-order 
factors, followed by a number of columns equal to the 
number of primaries, this second part forming an orthog- 
onal simple structure. 

20. Boundary conditions. These refer to the conditions 
under which a matrix of correlation coefficients can be 
explained by orthogonal factors which run each through 
only a given number of tests. The problem was first 
raised by Thomson (19196) and a beginning made with 
its solution (J. R. Thompson, Appendix to Thomson's 
paper). Various papers by J. R. Thompson culminated 
in that of 1929, and sec also Black, 19"29. Thomson 
returned to the problem in connexion with rotations in the 
common-factor space (Thomson, 19366), and Ledermann 
gave rigorous proofs of the theorems enunciated by 
Thomson and Thompson and extended them (Ledermann, 
1936). A necessary condition is that if the largest latent 
root of the matrix of correlations exceeds the integer s, 
then factors which run through s tests only and have zero 
loadings in the other tests are certainly inadequate. This 
rule has not been proved to be sufficient, and when applied 
to the common-factor space only it is certainly not suf- 
ficient, though it seems to be a good guide. Ledermann 
(1936, 170-4) has given a stringent condition as follows. 
If we define the nullity of a square matrix as order minus 
rank, then if it is to be possible to factorize orthogonally a 
matrix R of rank r in such a way that the matrix of load- 
ings contains at least r zeros in each of its columns, the 
sum of the nullities of all the r-rowed principal minors of 
R must at least be equal to r. 

21. The sampling of bonds. The root idea is that of the 
complete family of variates that can be made by all possible 
additive combinations of bonds from a given pool, and 
the complete family of correlation coefficients between 
pairs of these, Thomson (19276) mooted the idea and 
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worked out the example quoted in Chapter XX. He 
had earlier (1927a) showed that with all-or-none bonds the 
most probable value of a correlation coefficient is VCPiPa)* 
where the p's are fractions of the whole pool forming the 
variates, and the most probable value of a tetrad-difference 
F, zero. Mackie (1928a) showed that the mean tetrad- 
difference is zero, and its variance, for JF\ 



PlP^P* + PlPtP* + PzP'tP*) + 

2(N 2) 



where N is the number of bonds in the whole pool. He 
found for the mean value of r 12 the value VCjPiPzJj and for 
its variance 

2 _ (lj- PiXi_7" P?) 
^12^ ~N I~~ 

This is not the variance of all possible correlation 
coefficients, but of those formed by taking fractions p^ and 
j> 2 of the pool. The whole family of correlation coefficients 
will be widely scattered by reason of the different values 
of p, " rich " tests having high correlations, and those 
with low p, low correlations. Mackie (1929) next extended 
these formulas to variable coefficients (i.e. bonds which no 
longer were all-or-none). He again found the mean value 
of F to be zero, and for its variance 



2 

The presence of - in this is due to Mackie's limitation to 

TU 

positive loadings of the bonds. Thomson (19356, 72) 
removed this limitation and found 
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Similarly, Mackie found for variable positive loadings 
(1929) 



= -(i-(-Yl 

r N t w / 



and for all loadings Thomson found (1935&) 

* = N 

Thomson suggested without proof that in general, when 
limits are set to the variability of the loadings of the bonds, 
resulting in a family of correlation coefficients averaging r, 
these correlations will form a distribution with variance 

- 2 - /I _ yZ\ 

r ~~ AT 

and will give tetrad-differences averaging zero with a 
variance 

2) 



Summing up, Thomson says (1935fc, 77-8) : " The sam- 
pling principle taken alone gives correlations of all values 
. . . and zero tetrad -differences if N be large. Fitting the 
sampled elements with weights ... if the weights may 
be any weights . . . destroys correlation when N is infinite. 
This means that on the Sampling Theory a certain approxi- 
mation to ' all-or-none-ness ' is a necessary assumption 
not to explain zero tetrad-differences, but to explain 
the existence of correlations of ... large size. . . . The 
most important point in all this appears to me to be the 
fact that on all these hypotheses the tetrad-differences tend to 
vanish. This tendency appears to be a natural one among 
correlation coefficients . ' J 

A tendency for tetrad-differences to vanish means, of 
course a still stronger tendency for large minors of the 
correlational matrix to vanish. In more general terms, 
therefore, Thomson's theorem is that in a complete family 
of correlation coefficients the rank of the correlation matrix 
tends towards unity, and that a random sample of variates 
from this family will (in less strong measure) show the 
same tendency. 



REFERENCES 

Tins list is not a bibliography, and makes no pretensions to com- 
pleteness. It has, on the contrary, been kept as short as possible, 
and in any case contains hardly any mention of experimental articles. 
Other references will be found in the works here listed. 

References to this list in the text are given thus (Mackie, 1929, 
17), or, where more than one article by the same author comes in 
the same year (Burt, 1937&, 84). Throughout the text, however, 
the two important books by Spearman and by Thurstone are 
referred to by the short titles Abilities and Vectors respectively, and 
Thurstone 's later book, Multiple Factor Analysis, by the abbrevia- 
tion M. F. Anal. Other abbreviations are : 

A.J.P. = American Journal of Psychology. 

B.J.P. = British Journal of Psychology, General Section. 

B.J. P. Statist. = British Journal of Psychology , Statistical Section 

B.J.E.P. = British Journal of Educational Psychology. 

J.E.P. Journal of Educational Psychology. 

Pmka. = Psychometrika. 

AITKEN, A. C., 1934, " Note on Selection from a Multi-variate 
Normal Population," Proc. Edinburgh Math. Soc., 4, 106-10. 

1937a, " The Evaluation of a Certain Triple-product Matrix," 
Proc. Roy. Soc. Edinburgh, 57, 172-81. 

1937&, " The Evaluation of the Latent Roots and Vectors of a 
Matrix," ibid., 57, 269-304. 

ALEXANDER, W. P., 1935, " Intelligence, Concrete and Abstract," 
B.J.P. Monograph Supplement 19. 

ANASTASI, Anne, 1936, " The Influence of Specific Experience 
upon Mental Organization," Genetic Psychol. Monographs 8, 
245-355. 
and Garrett (see under Garrett). 

BAILES, S., and Thomson (see under Thomson). 

BARTLETT, M. S., 1935, " The Statistical Estimation of G," B.J.P., 

26, 199-206. 
1937a, " The Statistical Conception of Mental Factors," ibid., 28, 

97-104. 

19376, " The Development of Correlations among Genetic Com- 
ponents of Ability," Annals of Eugenics, 7, 299-302. 
1938, " Methods of estimating Mental Factors," Nature, 141, 

609-10. 

1948, " Internal and External Factor Analysis," B.J. P. Statist., 
1, 73-81. 

383 



384 REFERENCES 

BLACK, T. P., 1929, "The Probable Error of Some Boundary 
Conditions in diagnosing the Presence of Group and General 
Factors," Proc. Hoy. Soc. Edinburgh, 49, 72-7. 
BLAKEY, R., 1940, " Re-analysis of a Test of the Theory of Two 

Factors," Pmka., 5, 121-36. 

BBOWN, W., 1910, " The Correlation of Mental Abilities," B.J.P., 3, 
296-322. 

and Stephenson, W., 1933, "A Test of the Theory of Two 
Factors," ibid., 23, 352-70. 

1911, and with Thomson, G. H., 1921, 1925, and 1940, The 

Essentials of Mental Measurement (Cambridge). 
BUBT, C., 1917, The Distribution and Relations of Educational Abili- 
ties (London). 

1936, " The Analysis of Examination Marks," a memorandum in 
The Marks of Examiners (London), by Hartog and Rhodes. 

1937a, " Methods of Factor Analysis with and without Successive 
Approximation," B.J.E.P., 7, 172-95. 

19376, " Correlations between Persons," B.J.P., 28,, 59-96. 

1938a, " The Analysis of Temperament," B. J. Medical P., 17, 
158-88. 

19386, " The Unit Hierarchy," Pmka., 3, 151-68. 

1939, " Factorial Analysis. Lines of Possible Reconcilement," 
B.J.P., 30, 84-93. 

1940, The Factors of the Mind (London). 

and Stephenson, W., " Alternative Views on Correlations between 

Persons," Pmka., 4, 269-82. 

CATTELL, R. B., 19440, "Cluster Search Methods," Pmka., 9,169-84. 
19446, " Parallel Proportional Profiles," Pmka., 9, '267-83. 

1945, " The Principal Trait Clusters for Describing Personality," 
Psychol. Bull, 42, 129-61. 

1946, Description and Measurement of Personality (New York). 

1948, " Personality Factors in Women," B.J. P. Statist., 1, 114- 
130. 

COOMBS, C. II., 1941, " Criterion for Significant Common Factor 

Variance," Pmka., 6, 267-72. 
DAVEY, Constance M., 1926, " A Comparison of Group, Verbal, and 

Pictorial Intelligence Tests," B.J.P., 17, 27-48. 
DAVIS, F. B., 1945, " Reliability of Component Scores," Pmka., 10, 

57-60. 
DODD, S. C., 1928, " The Theory of Factors," Psychol. Rev., 35, 

211-34 and 261-79. 

1929, " The Sampling Theory of Intelligence," B.J.P., 19, 306-27. 
EMMETT, W. G., 1936, " Sampling Error and the Two-factor Theory," 

B.J.P., 26, 362-87. 

1949, " Factor Analysis by Lawley's Method of Maximum Likeli- 
hood," B.J.P.Statist., II, (2), 90-7. 

FERGUSON, G. A., 1941, "The Factorial Interpretation of Test 
Difficulty," Pmka. 9 6, 323-29. 



REFERENCES 385 

FISHER, R. A., 1922, " Goodness of Fit of Regression Formulae," 

Journ. Roy. Stat. Soc., 85, 597-612. 
1925, " Applications of 4 Student's ' Distribution," Metron, 5 (3), 

3-17. 
1925 and later editions, Statistical Methods for Research Workers 

(Edinburgh). 

1935 and later editions, The Design of Experiments (Edinburgh), 
and Yates, F., 1938 and later editions, Statistical Tables 

(Edinburgh). 
GAKNKTT, J. C. M., 1919a, " On Certain Independent Factors in 

Mental Measurement," Proc. Roy. Soc., A, 96, 91-111. 
19196, " General Ability, Cleverness, and Purpose," B.J.P., 9, 

345-66. 
GARRETT, II. E., and Anastasi, Anne, 1932, " The Tetrad-difference 

Criterion and the Measurement of Mental Traits," Annals New 

York Acad. Sciences, 33, 233-82. 
HEYWOOD, H. B., 1931, " On Finite Sequences of Real Numbers," 

Proc. Roy. Soc., A, 134, 486-501. 
HOLZINGER, K. J., 1935, Preliminary Reports on Spearman-Hoi- 

zinger Unitary Trait Study, No. 5 ; Introduction to Bif actor 

Theory (Chicago). 
1937, Student Manual of Factor Analysis (Chicago) (assisted by 

Frances Swineford and Harry Harman). 
1940, " A Synthetic Approach to Factor Analysis," Pmka., 5, 

235-50. 

1944, " A Simple Method of Factor Analysis," Pmka., 9, 257-62. 
and Harman, H. II., 19376, "Relationships between Factors 

obtained from Certain Analyses," J.E.P., 28, 321-45. 
and Harman, H. H., 1938, " Comparison of Two Factorial 

Analyses," Pmka., 3, 45-60. 

and Harman, H. II., 1941, Factor Analysis (Chicago), 
and Spearman (see under Spearman). 
HORST, P., 1941 , " A Non-graphical Method for Transforming into 

Simple Structure," Pmka., 6, 79-100. 
HOTELLING, H., 1933. "Analysis of a Complex of Statistical Variables 

into Principal Components," J.E.P., 24, 417-41 and 498-520. 
1935a, " The Most Predictable Criterion," J.E.P., 26, 139-42. 
19356, " Simplified Calculation of Principal Components," Pmka., 

1, 27-35. 
1936, " Relations between Two Sets of Variates," Biometrika, 28, 

321-77. 
IRWIN, J. O., 1932, " On the Uniqueness of the Factor g for General 

Intelligence," B.J.P., 22, 359-63. 
1933, " A Critical Discussion of the Single-factor Theory," ibid., 

23, 371-81. 
KELLEY, T. L., 1923, Statistical Method (New York). 

1928, Crossroads in the Mind of Man (Stanford and Oxford). 
1935, Essential Traits of Mental Life (Harvard). 



386 REFERENCES 

LANDAHL, H. D., 1938, u Ccntroid Orthogonal Transformations," 

Pmka., 3, 219-23. 
LAWLEY, D. N., 1940, " The Estimation of Factor Loadings by the 

Method of Maximum Likelihood," Proc. Roy. Soc. Edin., 60, 

64-82. 
1941, " Further Investigations in Factor Estimation," ibid., 16, 

176-85. 
1943a, " On Problems Connected with Item Selection and Test 

Construction," ibid., 61, 273-87. 
19436, " The Application of the Maximum Likelihood Method to 

Factor Analysis," B.J.P., 33, 172-175. 
1943e, " A Note on Karl Pearson's Selection Formulae," Proc. Roy. 

Soc., Edin., 62, 28-30. 
1944, " The Factorial Analysis of Multiple Item Tests," ibid., 62, 

74-82. 
1949, " Problems in Factor Analysis," ibid., 62, Part IV, 

(No. 41). 

LEDERMANN, W., 1936, " Mathematical Remarks concerning Bound- 
ary Conditions in Factorial Analysis," Pmka., 1, 165-74. 
1937a, " On the Rank of the Reduced Correlational Matrix in 

Multiple-factor Analysis," ibid., 2, 85-93. 
19376, " On an Upper Limit for the Latent Roots of a Certain 

Class of Matrices," J. Land. Math. Soc., 12, 14-18. 
1938a, " A Shortened Method of Estimation of Mental Factors 

by Regression," Nature, 141, 650. 
19386, " Note on Professor Godfrey Thomson's Article on the 

Influence of Univariate Selection on Factorial Analysis," 

B.J.P., 29, 69-73. 
1938c, " The Orthogonal Transformations of a Factorial Matrix 

into Itself," Pmka., 3, 181-87. 

1939, " Sampling Distribution and Selection in a Normal Popu- 
lation," Biometrika, 30, 295-304. 
19396, " A Shortened Method of Estimation of Mental Factors by 

Regression," Pmka., 4, 109-16. 
1940, " A Problem Concerning Matrices with Variable Diagonal 

Elements," Proc. Roy. Soc. Edin., 60, 1-17. 
and Thomson (see under Thomson). 
MACKIE, J., 1928a, " The Probable Value of the Tetrad-difference 

on the Sampling Theory," B.J.P., 19, 65-76. 
19286, " The Sampling Theory as a Variant of the Two-factor 

Theory," J.E.P., 19, 614-21. 
1929, " Mathematical Consequences of Certain Theories of Mental 

Ability," Proc. Roy. Soc. Edinburgh, 49, 16-37. 
MCNEMAR, Q., 1941, " On the Sampling Errors of Factor Loadings," 

Pmka., 6, 141-52. 

1942, " On the Number of Factors," ibid., 7, 9-18. 
MKDLAND, F. F., 1947, "An empirical comparison of Methods of 

Communality Estimation," Pmka., 12, 101-10. 



REFERENCES 387 

MOSIER, C. I., 1939, " Influence of Chance Error on Simple Struc- 
ture," Pmka., 4, 33-44. 

PEARSON, K., 1902, " On the Influence of Natural Selection on 
the Variability and Correlation of Organs," Phil. Trans. Roy. 
Soc. London, 200, A, 1-66. 

1912, " On the General Theory of the Influence of Selection on 
Correlation and Variation," Biometrika, 8, 437-43. 

and Filon, L. N. G., 1898, " On the Probable Errors of Frequency 
Constants and on the Influence of Random Selection on Varia- 
tion and Correlation," Phil. Trans. Roy. Soc. London, 191, A, 
229-311. 

Jeffery, G. B., and Elderton, E. M., 1929, " On the Distribution of 
the First Product-moment Coefficient, etc.," Biometrika, 21, 
191-2. 
and Moul, M., 1927, " The Sampling Errors in the Theory of a 

Generalized Factor," ibid., 19, 246-91. 
PEEL, E. A., 1947, " A short method for calculating Maximum 

Battery Reliability," Nature, 159, 816. 

1948, " Prediction of a Complex Criterion and Battery Re- 
liability," B.J.P.Statist., 1, 84-94. 
PIAGGIO, H. T. H., 1933, " Three Sets of Conditions Necessary for 

the Existence of a g that is Real," B.J.P., 24, 88-105. 
PRICE, B,, 1936, " Homogamy and the Intercorrelation of Capacity 

Traits," Annals of Eugenics, 7, 22-7. 
REYBURN, H. A., and Taylor, J. G., 1939, " Some Factors of 

Personality," B.J.P., 30, 151-65. 

1941a, " Some Factors of Intelligence," ibid., 31, 249-61. 
19416, " Factors in Introversion and Extra version," ibid., 31, 

335-40. 
1943a, " On the Interpretation of Common Factors : a Criticism 

and a Statement," Pmka., 8, 53-64. 
19436, " Some Factors of Temperament : a Re-examination," 

ibid., 8, 91-104. 

SPEARMAN, C., 1904, " General Intelligence objectively Determined 
and Measured," A.J.P., 15, 201-93. 

1913, " Correlation of Sums or Differences," B.J.P., 5, 417-26. 

1914, " The Theory of Two Factors," Psychol. Rev., 21, 101-15. 
1926 and 1932, The Abilities of Man (London). 

1928, " The Substructure of the Mind," B.J.P., 18, 249-61. 
1931, " Sampling Error of Tetrad -differences," J.E.P., 22, 388. 
1939a, " Thurstone's Work Reworked," J.E.P., 30, 1-16. 
19396, " Determination of Factors," B.J.P., 30, 78-83. 
and Hart, B., 1912, " General Ability, its Existence and Nature," 

B.J.P., 5, 51-84. 
and Holzinger, K. J., 1924, " The Sampling Error in the Theory 

of Two Factors," B.J.P., 15, 17-19. 
and Holzinger, K. J., 1925, " Note on the Sampling Error of 

Tetrad-differences," ibid., 16, 86-8. 



388 REFERENCES 

SPEARMAN, C., and Holzinger, K. J., 1929, " Average Value for the 

Probable Error of Tetrad-differences/' ibid., 20, 368-70. 
STEPHENSON, W., 1931, " Tetrad-differences for Verbal Sub-tests 
relative to Non-verbal Sub-tests," J.E.P., 22, 334-50. 

1935a, " The Technique of Factor Analysis," Nature, 136, 297. 

19356, " Correlating Persons instead of Tests," Character and 
Personality, 4, 17-24. 

19360, " A new Application of Correlation to Averages," B.J.E.P., 
6, 43-57. 

19366, " The Inverted-factor Technique," B.J.P., 26, 344-61. 

1936c, " Introduction to In verted -factor Analysis, with some 
Applications to Studies in Orexis," J.E.P., 27, 353-67. 

1936d, " The Foundations of Psychometry : Four Factor Sys- 
tems," Pmka., 1, 195-209. 

1939, " Abilities Defined as Non-fractional Factors," B.J.P., 30, 
94-104. 

and Brown (sec under Brown). 

and Burt, C. (see under Burt). 

SWINEFOBD, FRANCES, 1941, " Comparisons of the Multiple -factor 
and Bifactor Methods of Analysis," Pmka., 6, 375-82 (see also 
Holzinger, 19370). 
THOMPSON, J. R. (see under Thomson, 19196). 

1929, " The General Expression for Boundary Conditions and the 

Limits of Correlation," Proc. Roy. Soc. Edinburgh, 49, 65-71. 
THOMSON, G. H., 1916, " A Hierarchy without a General Factor," 
B.J.P., 8, 271-81. 

19190, " On the Cause of Hierarchical Order among Correlation 
Coefficients," Proc. Roy. Soc., A, 95, 400-8. 

19196, " The Proof or Disproof of the Existence of General 
Ability" (with Appendix by J. K. Thompson), B.J.P., 9, 
321-36. 

19270, " The Tetrad-difference Criterion," B.J.P., 17, 235-55. 

19276, " A Worked-out Example of the Possible Linkages of 
Four Correlated Variables on the Sampling Theory," ibid., 18, 
68-76. 

19340, " Hotelling's Method modified to give Spearman's g," 
J.E.P., 25, 366-74. 

19346, " The Meaning of i in the Estimate of g," B.J.P., 25, 
92-9. 

1934c, " On measuring g and s by Tests which break the 
^-hierarchy," ibid., 25, 204-40. 

19350, " The Definition and Measurement of g (General Intelli- 
gence)," J.E.P., 26, 241-62. 

19356, " On Complete Families of Correlation Coefficients and 
their Tendency to Zero Tetrad-differences : including a State- 
ment of the Sampling Theory of Abilities," B.J.P., 26, 63-92. 

19360, " Some Points of Mathematical Technique in the Factorial 
Analysis of Ability," J^.P., 27, 37-54. 



REFERENCES 889 

THOMSON, G. H., 19366, "Boundary Conditions in the Common-factor 

Space, in the Factorial Analysis of Ability," Pmka., 1, 155-63. 
1937, " Selection and Mental Factors," Nature, 140, 934. 
1938a, " Methods of estimating Mental Factors," ibid., 141, 246. 
19386, " The Influence of Univariate Selection on the Factorial 

Analysis of Ability," B. J.P., 28, 451-9. 
1938c, " On Maximizing the Specific Factors in the Analysis of 

Ability," B.J.E.P., 8, 255-63. 
1938d, " The Application of Quantitative Methods in Psychology," 

see Proc. Roy. Soc. B., 125, 415-34. 

19380, " Recent Developments of Statistical Methods in Psy- 
chology," Occupational Psychology, 12, 319-25. 
1939a " Factorial Analysis. The Present Position and the 

Problems Confronting Us," B.J.P., 30, 71-77. "Agreement and 

Disagreement in Factor Analysis. A Summing Up," ibid., 1058. 
19396, " Natural Variances of Mental Tests, and the Symmetry 

Criterion," Nature, 144, 516. 
19400, An Analysis of Performance Test Scores of a Representative 

Group of Scottish Children (London). 
19406, "Weighting for Battery Reliability and Prediction," 

B.J.P., 30, 357-66. 
1941, "The Speed Factor in Performance Tests," B.J.P., 32, 

131-5. 
1943a, " A Note on Karl Pearson's Selection Formulae," Math. 

Gazette, December, 197-8. 
1944, " The Applicability of Karl Pearson's Selection Formula? in 

Follow-up Experiments," B.J.P., 34, 105. 
and Bailes, S., 1926, " The Reliability of Essay Marks," The Forum 

of Education, 4, 85-91. 
and Brown, W. (see under Brown), 
and Ledermann, W., 1938, " The Influence of Multi-variate 

Selection on the Factorial Analysis of Ability," B.J.P., 

29, 288-306. 

1947, " Maximum Correlation of Two Weighted Batteries," B. J.P. 
Statist., 1, (1), 27-34. 

1948, " Relations of Two Weighted Batteries," B.J.P. Statist., 
1, (2), S2-3. 

1949, " On Estimating Oblique Factors," B.J.P. Statist., 2, (l), 1-2. 

THORNDIKE, E. L., 1925, The Measurement of Intelligence (New 
York). 

THURSTONE, L. L., 1932, The Theory of Multiple Factors (Chicago). 
1933, A Simplified Multiple-factor Method (Chicago). 
1935, The Vectors of Mind (Chicago). 
1938a, " Primary Mental Abilities," Psychometric Monograph No. 1 

(Chicago). 

19386, " The Perceptual Factor," Pmka., 3, 1-18. 
1938c, " A New Rotational Method," ibid., 3, 199-218. 



390 REFERENCES 

THUBSTONE, L. L., 1940a, " An Experimental Study of Simple 

Structure," ibid., 5, 153-68. 
19406, " Current Issues in Factor Analysis," PsychoL Bull, 37, 

189-236. 

19440, " Second-order Factors," Pmka., 9, 71-100. 
19446, A Factorial Study of Perception (Chicago), 
and Thurstone, T. G., 1941, " Factorial Studies of Intelligence," 

Psychometric Monograph, 2 (Chicago). 
1947, Multiple Factor Analysis (Chicago). 
THURSTONE, T. G., 1941, " Primary Mental Abilities of Children," 

Educ. and PsychoL Meas., 1, 105-16. 

TRYON, R. C., 1932a, " Multiple Factors vs Two Factors as Deter- 
miners of Abilities," PsychoL Rev., 39, 324-51. 
19326, " So-called Group Factors as Determiners of Ability," 

ibid., 39, 403-39. 
1935, " A Theory of Psychological Components an Alternative 

to Mathematical Factors," ibid., 42, 425-54. 
1939, Cluster Analysis (Berkeley, Cal.). 
TUCKER, L. R., 1940, " The Role of Correlated Factors in Factor 

Analysis," Pmka., 5, 141-52. 

1944, " A Semianalytical Method of Rotation to Simple Struc- 
ture," Pmka., 9, 43-68. 
WILSON, E. B., 1928a, " On Hierarchical Correlation Systems," 

Proc. Nat. Acad. Sc., 14, 283-91. 

19286, " Review of The Abilities of Man," Science, 67, 244-8. 
1933a, " On the In variance of General Intelligence," Proc. Nat. 

Acad. Sc., 19, 768-72. 
19336, " Transformations preserving the Tetrad Equations," ibid., 

19, 882-4. 

1933c, " On Overlap," ibid., 19, 1039-44. 
1934, " On Resolution into Generals and Specifics," ibid., 20, 

193-6. 
and Worcester, Jane, 1934, " The Resolution of Four Tests," 

ibid., 20, 189-92. 
and Worcester, Jane, 1939, " Note on Factor Analysis," Pmka., 

4, 133-48. 
WISHART, J., 1928, " Sampling Errors in the Theory of Two Factors ," 

B.J.P., 19, 180-7. 

WORCESTER, Jane, and Wilson (see under Wilson). 
WRIGHT, S., 1921, " Systems of Mating," Genetics, 6, 111-78. 
YATES, (see Fisher And Yates). 
YOUNG, G., 1939, " Factor Analysis and the Index of Clustering," 

Pmka., 4, 201-8. 
1941, " Maximum Likelihood Estimation and Factor Analysis," 

ibid., 6, 49-53. 

and Householder, A. S., 1940, " Factorial Invariance and Signi- 
ficance," ibid., 5, 47-56, 



INDEX 

The numbers refer to pages. Terms which occur repeatedly 
through the book are only indexed on their first occurrence or 
where they are denned. Most chapter and section headings are 
indexed. The names Spearman and Thurstone (which occur so 
frequently) are not indexed, nor is the author's name. 



Acceleration by powering, 74. 
Aitken, 22, 76, 89 ff., 103, 107, 

188 (selection formula), 369, 

378. 

Alexander, 36, 51, 91, 114, 243. 
Anarchic doctrine, 42, 47. 
Anastasi, 306. 
Attenuation, 79, 148. 
Axes, 60 (orthogonal), 61 (Spear- 

man), 68 and 363 (principal). 



Babington Smith, 353. 

Bailes, 204. 

Bartlett, M. S., 51, 100, 134 ff., 
and 369 (Bartletfs estimates), 
138 (a numerical calculation of), 
205, 305, 318-19, 365, 366 
(external factorial analysis). 

Bifactors, 19, 286, 343. 

Binet, 4, 51-2, 91. 

Bipolar factors, 332. 

Black, 267, 380. 

Blakey, 169. 

Bonds of the mind, 45 ff., 51. 

Boundary conditions, 262 ff., 380. 

" Box " correlations, 280, 297. 

Brown, 7, 18, 42, 148, 227, 236, 
242. 

Burt, 25, 75, 169, 199, 206 ff. 
(marks of examiners), 213 ff., 
221 (temperament), 289 (covari- 
ances), 330, 332, 347, 372, 375. 



Cattell, 295, 339, 349. 

Centring a matrix, 214, 218 
(special features of double 
centring). 

" Centroid " method, 23, 98 (and 
pooling square), 164 (with 
guessed communalities), 354. 

Circumflex mark (as x) indicates 
an estimate, see 83 ff . 

Cluster analysis, 349. 

Coefficient of richness, 45. 

Common-factor space, 63 ff . 

Communalities, 23, 38 (unique), 
161 (approximate), 361. 

Coombs, 169. 

Correlation coefficient, 5 (product- 
moment formula), 57, 64 (as 
cosine), 83 (as estimation coeffi- 
cient), 168 (of correlation coeffi- 
cients), 173 (partial), 199 ff. 
(between persons). 

Cosine as correlation coefficient 
57, 64. 

Co variance, 11, 214 (analysis of), 
330, 352 (of regression coeffi- 
cients), 372. 

Criterion, 85. 



Davis, 170. 

Degrees of freedom, 148. 

Direction cosines, 283. 



391 



892 

Dodd, 43. 

Doolittle method, 369. 



INDEX 



Elderton, 151, 372. 

Ellipsoids of density, 67, 363. 

Emmett, 151, 152, 329. 

Error specifics, 132, 167. 

Error, standard, see standard 
deviation. 

Estimates, 118 ff. (correlated), 124 
(calculation of variances and 
covariances), 134 ff. and 369 
(BartletVs). 

Estimation, 83 ft ., 95 (geomet- 
rical picture of), 102 (of fac- 
tors by regression), 110 and 367 
(of specifics), 115 (direct or via 
factors), 367. 

Etherington, 91. 

Extended vectors, 250, 274. 

Factors, 3 (tests), 4 (fictitious), 
14 (group), 15 (verbal), 102 
(estimation by regression), 168 
(limits to number of), 186 
(oblique), 193 (creation of new), 
240 (danger of reifying), 262 ff. 
(limits to extent of), 273 (prim- 
ary), 297 ff. (second order). 

Filon, 167. 

Fisher, 150, 321, 352. 

G, 8 (saturations), 49 (mental 
energy), 225 (definition of), 230 
(pure g), 286. 

Garnett, 20, 31, 57, 354. 

Geometrical picture of correla- 
tion, 55, 66, 353. 

Geometrical picture of estima- 
tion, 95. 

Group factors, 14. 

Harman, 286. 

Hart, 7. 

Heywood, 131, 231, 298. 



Hierarchical order, 5, 234 (when 

persons and tests equally nu- 

merous). 

Histogram, 13, 152. 
Hollow staircase pattern, 19. 
Holzinger, 19, 151, 152, 286, 

343, 346, 369, 372. 
Hotelling, 24, 53, 60-1, 66 ff., 

86, 100, 133, 170, 215, 330, 354, 

364, 365. 

Independence of units, 332 ff. 
Indeterminacy, 336, 371. 
Inequality of men, 53, 313. 
Inner product, 31. 
Invariance of simple structure, 
184, 292, 294. 

Jcffery, 151, 372. 
Kelley, 74, 99, 157, 175. 



Landahl, 255. - 

Latent root, 72-3, 75-6, 215, 

267, 363. 

Lawley, 169, 195, 321, 334, 373. 
Ledermann, 40, 110, 186, 187, 

260, 267, 270, 371, 372, 373, 

378, 380. 
Loadings, 25, 76 (properties of 

Hotelling). 



Mackie, 53, 312, 314, 381, 382. 

McNemar, 167, 169. 

Matrix, 8, 190 (calculation of re- 
ciprocal), 350. 

Maximum likelihood method, 
169, 321, 373. 

Medland, 162, 352. 

Mental energy, 49, 241. 

Metric, 328. 

Minor determinant, 21. 

Monarchic doctrine, 42, 47. 

Moods, 211, 



INDEX 



393 



Moul, 151, 372. 

Multiple correlation, 85 ff., 94-5 

(calculation of), 98. 
Multiple -factor analysis, 20 ff. 

161 ff. 
Multivariate selection, 187, 294. 

Natural units, 331. 

Negative loadings, 33, 77, 211, 

304. 

Neural machines, 49. 
Normal curve, 145. 
Normalized scores, 6, 360. 
Normalizing coefficients, 252, 

275. 

Oblique factors, 186, 192, 195, 

261, 272 ff., 292, 339, 375. 
Oligarchic doctrine, 42, 47. 
Order of a determinant, 21. 
Orthogonal axes, 60. 
Orthogonal matrices, 291. 
Otis-Kelley formula, 175. 
Oval diagrams, 11. 

Parallel proportional profiles, 
295. 

Parsimony, 15. 

Pattern and structure, 272, 375. 

Pearson, K., 5, 151, 167, 171, 373. 

Peel, E. A., 100, 367. 

Physical measurements and hier- 
archical order, 315. 

Piaggio, 107. 

Pivotal condensation, 22, 89. 

Pooling square, 85 ff., 98 (and 
"centroid" method), 364. 

Price, 305. 

Primary factors, 273, 277. 

Principal components, 53, 60, 
66 ff., 69 (advantages and dis- 
advantages), 71 (calculation of 
loadings), 78 [calculation of 
a man's), 170, 363. 

Product-moment formula, 5. 



Purification of batteries, 20, 44, 
129, 226. 

Rank of a matrix, 20, 178 (un- 
changed by selection), 183 (the 
same), 306 and 382 (low re- 
duced rank). 

Reciprocity of loadings and 
factors, 217 ff., 375. 

Reference values for detecting 
specific correlation, 156 ff. 

Reference vectors, 272, 277. 

Regression coefficients, 87 ff., 93, 
350 (Aitkerfs computation). 

Regression equation, 92, etc., 
365. 

Reliabilities, 79, 100, 148, 170. 

Residual matrix, 28, 155. 

Reyburn, 170, 287 ff., 339, 349. 

Richness, coefficient of, 45, 331, 
381. 

Rosner, 355 

Rotation, Landahl, 255. 

Rotation of axes, 36, 55, 64, 
243, 247 (graphical method), 
249 ff. (new method), 331, 361, 
379. 

Sampling error, 143 (two factor s) 9 
167, 372 (of tetrad-differences). 

Sampling theory, 42 ff., 303 ff., 
362, 381. 

Saturations, 8, 17, 153 (Spear- 
man's formula), 372. 

Second-order factors, 297 ff., 379. 

Selection, 171 ff. (univariate), 173 
(partial correlation), 184 (geo- 
metrical picture of), 185 (ran- 
dom), 187 ff. (multivariate). 

Sheppard, 354. 

Sign-changing in " centroid " pro- 
cess, 28, 30. 

Significance of factors, 324, 326. 

Simple structure, 242 If., 245 
(numerical example), 294, 332 
(independent of units). 



394 



INDEX 



Singly conforming tests, 236. 

Spearman weights, 105-6 (cal- 
culation of), 372. 

Specifics, 48 (maximized), 130 
(maximized and minimized), 
132 (error specifics), 336. 

Standard deviation, 5, 149 (of 
variance and of correlation 
coefficient), 352 (of a regression 
coefficient). 

Standardized scores, 6, 10, 360. 

Stephenson, 18, 199, 201, 209, 
211, 212, 227, 236, 242. 

Subpools of the mind, 50, 319. 

Swineford, 170. 

Sylvester, 375. 

Taylor, 170, 287 ff., 339, 349. 
Tetrad-differences, 12, 21. 
Thompson, J. R., 262, 267, 380. 
Thorndike, 91, 307, 317. 
Tripod analogy, 57. 
Tryon, 42, 344, 349. 
Tucker, 169. 
Two-factor theory, 3 ff., 361. 



Unique communalities, 38, 40 

(formula). 

Units, rational, 331. 
Univariate selection, 171 ff., 292. 

Variance, 5, 11, 318 (absolute), 
331 (natural), 350 (of regres- 
sion coefficients). 

Vectors, 56. 

Verbal factor, 15. 

Vocational guidance or advice 
19, 52, 114, 368. 

Weighted battery, 10 (Spear- 
man's weights), 105-6 (cal- 
culation of), 203 (of examin- 
ers). 

Wilson, 41, 57, 110, 168, 367. 

Wishart, 151, 372. 

Worcester, 41, 168. 

Wright, 305. 

Zero loadings, 65. 



